Home > Net >  How to modify a longest common subsequence algorithm to a longest common contiguous subsequence algo
How to modify a longest common subsequence algorithm to a longest common contiguous subsequence algo

Time:04-11

I'm wondering how to change the class I have so that it can produce the longest contiguous common subsequence between two strings. This code right now is setup so that I can get the length of the longest common subsequence of the two strings. I want the final result string to be unbroken (it can be found in both strings with no characters in between). I would like it if, for example, sending the method ACGGTTGTCGCAGTCC and TGTAGCAG would result in length 4 for GCAG, instead of what it does now, which is length 7 for TGTGCAG.

public class GenomeSequencer {
    private String x;
    private String y;
public String getLongestCommon() {
        int[][] table = lcsLength();
        return Integer.toString(table[x.length()][y.length()]);
    }

    private int[][] lcsLength() {
        int m = x.length();
        int n = y.length();

        int[][] c = new int[m   1][n   1];

        for (int i = 0; i <= m; i  ) {
            c[i][0] = 0;
        }
        for (int j = 0; j <= n; j  ) {
            c[0][j] = 0;
        }
        for (int i = 1; i <= m; i  ) {
            for (int j = 1; j <= n; j  ) {
                if (x.charAt(i - 1) == y.charAt(j - 1)) {
                    c[i][j] = c[i - 1][j - 1]   1;
                } else if (c[i - 1][j] >= c[i][j - 1]) {
                    c[i][j] = c[i - 1][j];
                } else {
                    c[i][j] = c[i][j - 1];
                }
            }
        }
        return c;

    }

}

CodePudding user response:

If you really want to reuse the old solution, consider why it doesn't work anymore.

Of the three transitions, the one when letters are equal should survive, and the two skipping a letter in one of the strings should disappear.

So, the transitions can be just:

                if (x.charAt(i - 1) == y.charAt(j - 1)) {
                    c[i][j] = c[i - 1][j - 1]   1;
                } else {
                    c[i][j] = 0;
                }

Note that, with this approach, the answer is no longer at c[m][n]: instead, you should search the whole table for the maximum value.


The simplicity of the transitions suggests there are faster solutions to the longest common substring problem (substring is a consecutive subsequence). And indeed there is a linear yet complex solution involving a suffix tree (mentioned in the above link), or simpler solutions using hashes and binary search for the answer's length.

  • Related