Home > Software engineering >  How to find the longest common substring of n strings using suffix array?
How to find the longest common substring of n strings using suffix array?

Time:10-13

I could do longest common substring using two strings each time. But consider 3 strings below:

  1. ABZDCC
  2. ABZDEC
  3. EFGHIC

Here we see that the lcs of the first two strings is ABZD. But when this will be compared to the third string, the length of lcs will be zero. But it is clear that the lcs is "C". How can I find the longest common substring of n strings using suffix array?

CodePudding user response:

If you have a suffix array that contains all the suffixes of every input string, then for any string X that is a (contiguous) substring of all the input strings, there is a contiguous subarray in which every suffix starts with X, that includes a suffix from every input string.

Furthermore, if X is a longest common substring, then the subarray can be as small as possible, such that the first and last suffixes in the subarray are the only suffixes from their corresponding input strings.

To find the longest substring, then:

  1. For each position in the suffix array, find the smallest subarray starting at that position that includes a suffix from every string. You can use the incrementing-two-pointers technique to do this in amortized constant time per entry.
  2. For each such subarray, find the longest common prefix of the first and last entries, which will be a common prefix of every item in the subarray.
  3. Remember the longest common prefix you find, which is the longest substring that occurs in every input.

CodePudding user response:

When working with more than two strings, find all common substrings between the two shortest input strings and then start eliminating common substrings that aren't included in the other input strings. When done, return the longest common remaining substring.

  • Related