Home > Enterprise >  Find two numbers in array such that all elements between them is smaller than smallest of two number
Find two numbers in array such that all elements between them is smaller than smallest of two number

Time:10-04

how to find pairs in array such that the elements which are between them are smaller than smallest element from the pair.

Eg.
10 4 6 8 7

In the above pairs can be formed like:
(10,4), (4,6), (6,8), (8,7) As there are no elements between them so it also works
(10,6) As only elements between them is 4 and 4 < 6 which is smallest of 10,6
(10, 8) * As 4 < 8 and also 6 < 8 *

but (10,7) will not make pair as 8 > 7

All Pairs:

(10, 4), (4, 6), (6, 8), (8, 7), (10, 6), (10, 8)

My Code:

def findPirs(n, values):
    res = 0
    arr = []
    for start in range(n-1):
        currMax = 0
        for end in range(start 1, n):
            m = min(values[start], values[end])
            if currMax < m:
                arr.append((values[start], values[end]))
                res  = 1
            currMax = max(currMax, values[end])

    return res, arr

This is my current Code, it works but it is very inefficient, as it has O(n^2) Time complexity.
Any Suggestion on how to improve?

CodePudding user response:

You don't need to take a minimum. If the new endpoint is greater than the maximum you've detected so far, then you have a winner.

def findPairs(values):
    arr = []
    for start in range(len(values)-1):
        currMax = 0
        for end in values[start 1:]:
            if end > currMax:
                arr.append((values[start], end))
                currMax =  end
    return arr

nums = [10, 4, 6, 8, 7]
print(findPairs( nums ))

CodePudding user response:

I thought some more at the solution I hinted at, in a comment above, and I think it could work. You will never get better than O(n²) worst-case, since you might have to output O(n²) pairs, but the thought is that you might want to have an algorithm that is fast when you do not have to output much, and then (necessarily) slow when there is much to output. An O(n z) algorithm, where z is the size of the output, is optimal for a case where you need to look at all the data, and you need to output all the pairs.

So, my solution, sketched below does that. It cannot compete with a fast O(n²) solution if the output is large--there is more overhead in it--but asymptotically it should work fine when you output o(n²).

The idea is only for intervals longer than two, so I treat all neighbours as a special case that I just report. I can easily do that in O(n). For the longer intervals, I consider the input a sequence of hills and valleys, with hill tops at the positions i where x[i-1] < x[i] and x[i] < x[i 1]. I don't know how you want to treat cases where you have runs of the same value, but you can adapt the code to handle it. It just depends on what you want to do with them.

If I have two neighbouring hill tops, there is a valley between them, and I must report intervals running down both sides. In the middle of the valley there will be some values that I cannot use, initially it is the minimal value in an interval (that cannot be included), and later I mask out larger chunks, turning small hills into valleys.

So, overall, I take pairs of neighbouring hill tops, output intervals on the hill sides between them.

def report_valid_intervals(x: list[int]):
    """Report intervals (i,j) such that for all i <= k < j, x[k] <= min(x[i],x[j])."""

    # All neighbours are valid, and these are a special case that we handle first.
    for i in range(len(x) - 1):
        yield i, i 1

    # Identify hills and valleys in the sequence. These are the valleys we use
    # to report intervals where the left and right points are larger than the
    # interval values. Identifying the hill tops (tops) and valleys (masks)
    # is done in O(n).
    masks = [False] * len(x)
    tops = find_tops(x, masks)

    # Now process valleys one by one. In each valley, we report the intervals inside it
    # in O(z) -- linear in what we report -- and then we update the mask, throwing
    # away one hill top (making it into a valley) for each pair, also in O(z) (we
    # only look at indices we also used in the output). When we are through all the
    # valleys we have elimiated at least half of them. Because we eliminate half each
    # time, the running time (excluding reporting) is the sum n   n/2   n/4   ... a
    # geomemtric sum that converges to O(n).
    while len(tops) > 1:
        for left, right in zip(tops, tops[1:]):
            for i, j in report_intervals_in_valley(x, left, right, masks):
                yield i, j
            update_valley_mask(x, left, right, masks)

        # eliminate the smaller tops
        tops = [top for top in tops if not masks[top]]

You can identify the hills and valleys as easy as this (but you might want to redefine the tops to handle cases where you have runs of the same value).

# Identifying hills and valleys...
def find_tops(x: list[int], masks: list[bool]) -> list[int]:
    """Locate the tops of the hills and mask out the bottom of the valleys.
    Runs in O(n) and only runs once."""
    # Get hill tops
    start = [0] if x[0] > x[1] else []
    end = [len(x)-1] if x[-1] > x[-2] else []
    mid = [i for i in range(1, len(x)-1) if x[i-1] < x[i] > x[i 1]]
    tops = start   mid   end

    # mask valley bottoms. Doesn't matter with the end-points,
    # since they are only relevant as tops.
    for i in range(1, len(x)-1):
        if x[i-1] > x[i] < x[i 1]:
            masks[i] = True

    return tops

Iteratively running through pairs neighbouring hill tops is linear time, because we eliminate half the tops each time, getting a geometric sum. I am not sure that argument is necessary here, if we analyse the underlying running time a bit more, but it is there if we need it. We just have to process each valley in time proportional to the output we get from it. And here it gets a little tricky.

You can output intervals in a valley by running down one side, and then running down the other side as long as you do not include a value smaller than the next value on the first side. If you output something every time you do this, you get O(z), but you could run down a side and test the first value on the other side repeatedly, without outputting anything. Then you spend O(m) time, where m is the length of the side of the valley, and then the running time goes above O(n z).

However, we are going to mask out indices that cannot be used after we have processed a valley. All indices with a value smaller than the smallest peak cannot be part of an interval that spans across a hill top, and we can mask those out so we do not consider them. If we process valleys such that we do the expensive work on the smaller hill, we will mask out the indices we process, and since they are only masked out once and then never considered again, we are back to O(n z).

# When reporting from a valley, work out the valid left and right interval to
# consider. Those are the indices from the left/right respectively until we see a
# masked index. The masked indices are valleys that we have already processed,
# and we do not need to consider anything in there, as intervals spanning them
# must also contain values that are too large for a hill top inbetween.


def report_intervals_in_valley_left(x: list[int],
                                    left_top: int, right_top: int,
                                    masks: list[bool]):
    """Report all valid intervals in a valley. The running time is O(lv z) where
    z is the number of intervals we report and lv is the size of the left side
    of the valley (because we look at all those indices even if we do not
    report). We will mask out those intervals afterward so we never look at
    them again, so they cannot contribute more than O(n) to the algorithm in
    total."""
    for i in range(left_top, right_top):
        if masks[i]:  # end of this side of the valley
            break
        for j in range(right_top, left_top, -1):
            if masks[j]:
                break
            if x[j] < x[i   1]:  # we cannot go below the next on the other side
                break  # no more hits from i
            yield i, j


def report_intervals_in_valley_right(x: list[int],
                                     left_top: int, right_top: int,
                                     masks: list[bool]):
    """Report all valid intervals in a valley. The running time is O(rv z) where
    z is the number of intervals we report and rv is the size of the right side
    of the valley (because we look at all those indices even if we do not
    report). We will mask out those intervals afterward so we never look at
    them again, so they cannot contribute more than O(n) to the algorithm in
    total."""
    for j in range(right_top, left_top, -1):
        if masks[j]:
            break
        for i in range(left_top, right_top):
            if masks[i]:  # end of this side of the valley
                break
            if x[i] < x[j - 1]:  # we cannot go below the next on the other side
                break  # no more hits from i
            yield i, j


def report_intervals_in_valley(x: list[int],
                               left_top: int, right_top: int,
                               masks: list[bool]):
    """Report all valid intervals in a valley. The running time is O(v z) where
    z is the number of intervals we report and v is the size of the side of the
    valley that will be masked out and not contribute to the running time again."""
    # make sure it is the side with the smallest top that risks doing
    # the most work
    if x[left_top] < x[right_top]:
        yield from report_intervals_in_valley_left(x, left_top, right_top, masks)
    else:
        yield from report_intervals_in_valley_right(x, left_top, right_top, masks)

When you update a valley by masking, you remove the values smaller than the smallest peak. Here, my current code doesn't quite do the right thing.

def update_valley_mask(x: list[int],
                       left_top: int, right_top: int,
                       masks: list[bool]):
    """Mask out the indices that are smaller than the smallest top. Those cannot
    be part of valid pairs when we merge this valey with its neighbours. The running
    time is less than the time we spent reporting intervals, since each index we
    consider was part of at least one pair that we outputted."""
    mi = min(x[left_top], x[right_top])
    masks[left_top if x[left_top] == mi else right_top] = True

    for i in range(left_top, right_top):  # range is entire interval, but we break
        if masks[i]:
            break  # we enter the already masked, so get out to get the correct running time
        if x[i] < mi:
            masks[i] = True
    for i in range(right_top, left_top, -1):
        if masks[i]:
            break
        if x[i] < mi:
            masks[i] = True

The code processes the sides of the valley, and it looks at indices that do not get masked out. That means that it doesn't guarantee the O(n z) running time. However, it should be simple enough to keep track of the closest masked index on the left and right of each hill top. We know the tops of our valley when we mask, so it would just be updating a list. If we do that, then we can start from the closest masked value and work our way up towards the hill top, and then we will mask all the indices we process exact the last, and then the running time is correct again.

I can have a go at it, but I suspect this is only of theoretical interest. The O(n²) solutions are probably fast enough for most cases (and if you have to output many intervals you cannot do better). But I'm reasonably sure that you can do it in O(n z), and probably smarter than this solution.

  • Related