Home > front end >  Shortest string that can be made in at least two ways from an array of strings
Shortest string that can be made in at least two ways from an array of strings

Time:10-24

Statement:

Given an array of strings S. You can make a string by combining elements from the array S (you can use an element more than once)

In some situations, there are many ways to make a certain string from the array S.

Example:

S = {a, ab, ba}

Then there are 2 ways to make the string "aba":

  • "a" "ba"
  • "ab" "a"

Question:

Given an array of string S. Find the shortest string such that there are more that one way to make that string from S. If there're none, print out -1.

P/S: I have been thinking for many days but this is the best one I've got so far:

  • Generate all permutations of the array
  • For each permutation, make a string from the array S
  • Check if that string is made before, if yes, print out the string, if not, save that string.

But this algorithm clearly won't pass all the test cases. I can't think of any better algorithm.

CodePudding user response:

Imagine that you have found your string, and you are matching it two different ways with strings from S. You start with two different strings that match the prefix, and then you repeatedly add a string from S to the shorter one until you end up with a matching length. From your example, that's

"ab"
"a"

"ab"
"aba"

"aba"
"aba"

At every step, you have 2 different strings from S that overlap at the end.

Imagine a directed graph where every vertex is a tuple (i,j,t), where i and j are the indexes of the overlapping strings at the end, and t is the number of characters left over at the end of the longer one after the overlapping section. Make it a rule that t >= 0 and that string i is always the one that ends first.

The edges of the graph indicate which vertexes you can get to by adding a new string to the shorter one, with a cost equal to the length of the added string. Of course you can only add a string if it overlaps with the t characters left over on the longer side.

Your task is then to use Dijkstra's algorithm to find the shortest path in this graph, from an initial selection of 2 distinct strings to a pair with t=0. Initially sorting the array of strings will let you use a binary search to find the strings that overlap the required suffix (the longer ones will all be together), which is an effective optimization.

CodePudding user response:

Here's an O(N3) algorithm, where N is the total length of each string:

  1. For every element Si in S:
    1. Construct an NFA for the regular expression Si(S1|...|Sn)*
    2. Construct an NFA for the regular expression (S1|...|Si-1|Si 1|...|Sn)(S1|...|Sn)*
    3. Construct an NFA that is the intersection of the NFA in step 1.1 and step 1.2
    4. Find the shortest string accepted by the NFA in step 1.3
  2. Return the shortest string among the strings in step 1.4

The above algorithm can be improved to O(N2logN):

  1. Construct an NFA for the regular expression (S1|...|Sn)*
  2. Construct cross-product of two copies of the NFA in step 1.
  3. For each state in the NFA in step 2, find the shortest string accepted by the NFA from that state.
  4. Let
  • Related