Let's suppose that array B is made from array A by concatenating it with itself n times (example: A=[1,2,3], n=3, B=[1,2,3,1,2,3,1,2,3]) What is an efficient algorithm to find A by given B (we don't know n)? UPD we search for smallest A (when B=[1,2,1,2,1,2,1,2], A = [1,2], not [1,2,1,2])
CodePudding user response:
Assuming that [1,2,1,2,1,2,1,2]
means n
is 4 and not 2, for example. In other words, you're assuming the smallest such sublist, A. Otherwise, there could be multiple solutions.
Enumerate the unique integer divisors of the length of B (including 1). These would be the only valid candidates for
n
.For each divisor, starting with the smallest, consider it as a candidate value for
n
:a. Take the first
len(B)/n
elements of B and start checking whether it is a sublist that repeats through B (I'll assume you can figure out an efficient way of doing this. I can add a suggestion if you need it.)b. If
n
works (you get to the end of B and everything matched), then you're done, otherwise, try the next divisor
CodePudding user response:
You could basically find the longest prefix in B
that is also a suffix. You can derive the table from the steps involved in KMP pattern matching
algorithm.
Note that there could be multiple correct solutions.(say 1,2,1,2,1,2,1,2
could have A
as 1,2,1,2
or 1,2
.
Once found, you will need to rerun the match against the slices of B
just to make sure the whole array B
matches the repeating pattern. This is necessary since there could be edge cases such as 1,2,1,2,3,4,1,2,1,2
which has 1,2,1,2
as the longest prefix that is also a suffix but it isn't a continuous repetition of A
.
If the obtained length doesn't divide the length of B
evenly, you will need to decrease the length evenly(as in factor wise) each time to see which one matches. (Example case: 1,2,1,2,1,2
).