I'm wondering how to change the class I have so that it can produce the longest contiguous common subsequence between two strings. This code right now is setup so that I can get the length of the longest common subsequence of the two strings. I want the final result string to be unbroken (it can be found in both strings with no characters in between). I would like it if, for example, sending the method ACGGTTGTCGCAGTCC and TGTAGCAG would result in length 4 for GCAG, instead of what it does now, which is length 7 for TGTGCAG.
public class GenomeSequencer {
private String x;
private String y;
public String getLongestCommon() {
int[][] table = lcsLength();
return Integer.toString(table[x.length()][y.length()]);
}
private int[][] lcsLength() {
int m = x.length();
int n = y.length();
int[][] c = new int[m 1][n 1];
for (int i = 0; i <= m; i ) {
c[i][0] = 0;
}
for (int j = 0; j <= n; j ) {
c[0][j] = 0;
}
for (int i = 1; i <= m; i ) {
for (int j = 1; j <= n; j ) {
if (x.charAt(i - 1) == y.charAt(j - 1)) {
c[i][j] = c[i - 1][j - 1] 1;
} else if (c[i - 1][j] >= c[i][j - 1]) {
c[i][j] = c[i - 1][j];
} else {
c[i][j] = c[i][j - 1];
}
}
}
return c;
}
}
CodePudding user response:
If you really want to reuse the old solution, consider why it doesn't work anymore.
Of the three transitions, the one when letters are equal should survive, and the two skipping a letter in one of the strings should disappear.
So, the transitions can be just:
if (x.charAt(i - 1) == y.charAt(j - 1)) {
c[i][j] = c[i - 1][j - 1] 1;
} else {
c[i][j] = 0;
}
Note that, with this approach, the answer is no longer at c[m][n]
: instead, you should search the whole table for the maximum value.
The simplicity of the transitions suggests there are faster solutions to the longest common substring problem (substring is a consecutive subsequence). And indeed there is a linear yet complex solution involving a suffix tree (mentioned in the above link), or simpler solutions using hashes and binary search for the answer's length.