Interview Question: Remove repeating numbers at the end of an array-CodePudding

I got a surprising interview question today at a big Bay Area tech company that I was absolutely stumped by despite seeming so easy. Was wondering if anyone has seen it or can offer a simpler solution as the interviewer didn't want to show me the answer. The solution can be written in any language or pseudocode.

Question:

Given a list of numbers, remove any extraneous repeating suffixes of numbers that appear at the end of the list until it has no repeating suffixes. The repeating set can be cut-off.

For example:

[1,2,3,4,5,6,7,5,6,7,5,6] -> [1,2,3,4,5,6,7]
explanation: [5, 6, 7] were repeating

Also consider the situation

[1,2,3,4,5,4,5,1,4,5,4,5,1,4,5,4,5,] -> [1,2,3,4,5,4,5,1] # not [1,2,3,4,5,4,5,1,4,5,4,5,1]
explanation: [4,5,4,5,1] were the largest repeating sets

CodePudding user response：

There are always to ways to approach this topic. Finding any solution and finding an efficient one. It is usually better to start with any and then think on how to optimize it.

Now as we can see in the second example, the problem is complicated by the fact that the repeating pattern is not known. So we could just do it for all the possible patterns at the end. Then we would need to check two things

is it actually repeating
how long is the result Then we could just take the shortest result. Here is some pseudo code:

def remove_repeating_tail(a: list) -> list:
    results = []
    for i in range(len(a)):
        tail = a[i:]
        results.append(remove_repeats(a, tail))

    if len(results) == 0:
        return a
    return sorted(results, key=len)[0]

Also we made sure we cover all the cases. Empty list, no repeating pattern. Next we need to write remove_repeats. Also we check the empty repeating pattern, so we need to be aware of that.

def remove_repeats(a: list, tail: list) -> list:
    assert len(tail) <= len(a)
    if len(tail) == 0:
        return a

    remainder = a
    count = 0
    while remainder[-len(tail):] == tail:
        remainder = remainder[:-len(tail)]
        count  = 1
    if count <= 1:
        return a
    return remainder

We remove the repeating pattern and then add it back at the end. Now its time to test the code if it actually works, if that is possible in the interview.

remove_repeating_tail([1,2,3,4,5,6,7,5,6,7,5,6])
-> [1, 2, 3, 4, 5, 6]

remove_repeating_tail([1,2,3,4,5,4,5,1,4,5,4,5,1,4,5,4,5])
-> [1, 2, 3, 4, 5, 4, 5]

Also good to check some other cases:

remove_repeating_tail([1,2,3,4])
-> [1, 2, 3, 4]

remove_repeating_tail([])
-> []

After quite a bit of fixing we got the above, which I think is correct. In particular I missed:

first I had an infinite loop in remove_repeats for an empty tail
remove_repeats removed always the tail and sometimes everything, as I wasn't checking that there is at least one repeat. I then added the counting.
I made simple mistakes like writing results = res instead of results.append(res) leading to some Exceptions.
Then a lot of simplification. First I used some sentinel None to communicate back that it is not repeating, but we could just return the whole list. Then I checked the repeating with some if before the while loop, but realized its basically doing the same as the first iteration, so I used counting.
Similarly I don't like the if len(results) == 0: check. I would probably add a to the result in the beginning and remove the check, as now there is always a result. Then we could start the counting from 1 instead of 0. Still I kept it in.

If we want something fast, we first need to analyze the complexity.

So remove repeating tails for a list of size n and tail size k is: O(n / k). Then we call this function n times. And then we sort it. Wait why do we sort it, we could just take the minimum return min(results, key=len). That's better.

In each loop we call remove_repeats starting with k = 1 to n. So we have: sum(k = 1 .. n) O(n / k). This is n / 1 n / 2 n / 3 .. n / n. I had to look this up on Wikipedia, but these are called harmonic numbers. We can also just make our live easy and say its less than O(n^2) for now. Otherwise I found an approximation of H_n = n ln(n) 0.5 n here. So the complexity overall is O(n log n). Not to bad I would say. Is it the optimal? Maybe. Here I would compare it to some other similar algorithms (like substring search, etc).

Before going there, at this point, I would check with the interviewer, where he would like to go next. As there are many directions.

CodePudding user response：

If a string has a repeating suffix, then the last character must occur more than once, because it has to repeat.

Conversely, if a string has no repeating suffixes, then the last character must occur only once, because otherwise it would be a repeating suffix.

Also, if a string has a multi-char repeating suffix, then it will have a repeating suffix at the same place after removing the last character: abXab -> abXa.

So... find the first occurrence of each character and truncate the string after the last first occurrence, to create the longest possible prefix with no repeating suffixes.

In Java, it looks like this:

String removeRepeatingSuffixes(String str) {
    HashSet<Character> set = new HashSet<>();
    int cut = 0;
    for (int i=0; i<str.length();   i) {
        if (set.add(str.charAt(i))) {
            // add() returns true if it didn't already exist
            cut = i 1;
        }
    }
    return str.substring(0, cut);
}

CodePudding user response：

This seems a tricky question and there may not be a simple solution. Best solution I can think of would be O(n) time and O(n) and that is if I am not missing any edge case.

Let's take as example [1,2,3,4,5,4,5,1,4,5,4,5,1,4,5,4,5] -> [1,2,3,4,5,4,5,1]

Steps would be as follows:

Iterate through the input array from last index to first and Build a dictionary (hashtable) with every number in the array being a key and value: a list of positions where the specific number is found in the array.

occurences dictionary will become:

{
    5: [14, 11, 9, 6, 4],
    4: [13, 10, 8, 5, 3],
    1: [12, 7, 0],
    3: [2]
    2: [1]
}

Find the possible suffix lengths by calculating deltas between every position and first position for every number. This way we take into consideration the case in which a specific number repeats in the suffix or in the prefix.

We then add each distinct possible suffix lenght to a set.

We sort the possible sufix lengths in descending order.

We get following suff lenghts: [12, 10, 7, 5, 2]

For every possible length l, we test if arr[n-1] == arr[n-1-l]. If l is our suffix's length, it means that the number at last position is repeated at exactly l positions before. We then check the last l elements to respect the same condition. If they do, we found the maximum suffix lenght. If not, the max suffix lenght is even smaller, so we check the next possible lenght.
After finding the correct suffix lenght, we delete the remaining numbers that repeat at positions pos-l. We then return the slice of array with suffix removed.

def removeRepeatingSuffixes(arr):
    if not arr:
        return []
    n = len(arr)
    occurences = {}
    for i in range(n - 1, -1, -1):
        c = arr[i]
        if c not in occurences:
            occurences[c] = []
        occurences[c].append(i)

    # treat edge case: no repeating suffix
    if len(occurences[arr[n-1]]) == 1:
        return arr

    # create a set of possible suffix lengths, 
    # based on the differences between the positions of each number.
    possible_sufixes_lengths_set = set()
    for c, olist in occurences.items():
        if len(olist) >= 2:
            for i in range(len(olist)-1):
                delta = olist[i] - olist[len(olist)-1]
                possible_sufixes_lengths_set.add(delta)

    suff_lenghts = sorted(possible_sufixes_lengths_set, reverse=True)
    for l in suff_lenghts:
        if arr[n - 1] == arr[n - 1 - l]:
            # possible suffix length, check if last l characters repeat
            ok_length = True
            for j in range(n-2, n-1-l, -1):
                if arr[j] != arr[j-l]:
                    ok_length = False
                    break
            if ok_length:
                last_i = n-1-l
                while last_i > 0 and arr[last_i] == arr[last_i - l]:
                    last_i -= 1
                # return non-repeating slice, from 0 to last_i
                return arr[0:last_i   1]