Longest Common Subsequence

Given two sequences and wish to find a maximum-length subsequence of X and Y.


Definition

Subsequence: Given a sequence X = <x1, x2, ......, xm>, another sequence Z = <z1, z1, ......zk> is a subsequence of X if there exists a strictly increasing sequence <i1, i2, ......ik> of indices of X such that for all j = 1, 2, ...... k, we have xij = zj.

Common Subsequence: Given two sequences X and Y, we say that s sequence Z is a common subsequence of X and Y if Z is a subsequence of both X and Y.

Theorem: Optimal substructure of an LCS
Let X = <x1, x2,..., xm> and Y = <y1, y2,...,yn> be sequences, and 
let Z = <z1, z2,...,zk> be any LCS of X and Y.
1. If xm = yn, then z_k = x_m = y_n and Z_(k-1) is an LCS of X_(m-1) 
and Y_(n-1).
2. If xm != yn, then z_k != x_m implies that Z is an LCS of X_(m-1)
and Y.
3. If xm != yn, then z_k != y_n implies that Z is an LCS of X and 
Y_(n-1).

Longest Common Subsequence

Analysis:

  • Subproblem: f[i][j]: the length of LCS ending at position i of string A and position j of string B

  • Initialize: f[i][0]=0 and f[0][j]=0. Length of LCS for a string and an empty string is 0

  • Function: if A.charAt(i-1)==B.charAt(j-1), f[i][j]=f[i-1][j-1]+1; if A.charAt(i-1) != B.charAt(j-1), f[i][j] = MAX(f[i-1][j], f[i][j-1])

  • Answer: f[A.length()][B.length()]

public int longestCommonSubsequence(String A, String B) {

        if (A == null || B == null)
            return 0;

        int M = A.length();
        int N = B.length();
        int[][] LCS = new int[M+1][N+1];

        // Initialization
        // Could skip cause Java automatically initialize arrays
        for (int i=0; i<=M; i++) {
            LCS[i][0] = 0;
        }
        for (int j=1; j<=N; j++) {
            LCS[0][j] = 0;
        }
        // Function
        for (int i=1; i<=M; i++) {
            for (int j=1; j<=N; j++) {
                LCS[i][j] = Math.max(LCS[i-1][j], LCS[i][j-1]);
                if (A.charAt(i-1) == B.charAt(j-1))
                    LCS[i][j] = LCS[i-1][j-1]+1;
            }
        }
        // Answer
        return LCS[M][N];
    }

Longest Common Substring

Analysis: Substring should occur continuously in original string. This is different with subsequence.

  • Subproblem: f[i][j]: common substring length with first i elements of s1 and first j elements of s2, not need to be the longest!

  • Initialize: f[i][0]=f[0][1]=0, cause no common substring with empty string

  • Function: if s1.charAt(i-1)==s2.charAt(j-1), f[i][j]=f[i-1][j-1]+1; else, f[i][j]=0

  • Answer: Need to search through the whole 2-dimentional array to get the max_value

public int longestCommonSubstring(String A, String B) {

        if (A == null || B == null || A.length() == 0 || B.length() == 0)
            return 0;

        int M = A.length();
        int N = B.length();
        int[][] dp = new int[M+1][N+1];

        // Initialize
        // No need to initialize, arrays initialize to 0 in Java

        // Function
        for (int i=1; i<=M; i++) {
            for (int j=1; j<=N; j++) {
                if (A.charAt(i-1) == B.charAt(j-1))
                    dp[i][j] = dp[i-1][j-1]+1;
                else 
                    dp[i][j] = 0;
            }
        }
        // Answer
        // Max value of the 2-dimentional array
        int max = 0;
        for (int i=0; i<=M; i++) {
            for (int j=0; j<=N; j++) {
                max = Math.max(max, dp[i][j]);
            }
        }
        return max;
    }

results matching ""

    No results matching ""