Home > Software engineering >  Repeatedly removing the maximum average subarray
Repeatedly removing the maximum average subarray

Time:02-28

I have an array of positive integers. For example:

[1, 7, 8, 4, 2, 1, 4]

A "reduction operation" finds the array prefix with the highest average, and deletes it. Here, an array prefix means a contiguous subarray whose left end is the start of the array, such as [1] or [1, 7] or [1, 7, 8] above. Ties are broken by taking the longer prefix.

Original array:  [  1,   7,   8,   4,   2,   1,   4]

Prefix averages: [1.0, 4.0, 5.3, 5.0, 4.4, 3.8, 3.9]

-> Delete [1, 7, 8], with maximum average 5.3
-> New array -> [4, 2, 1, 4]

I will repeat the reduction operation until the array is empty:

[1, 7, 8, 4, 2, 1, 4]
^       ^
[4, 2, 1, 4]
^ ^
[2, 1, 4]
^       ^
[]

Now, actually performing these array modifications isn't necessary; I'm only looking for the list of lengths of prefixes that would be deleted by this process, for example, [3, 1, 3] above.

What is an efficient algorithm for computing these prefix lengths?


The naive approach is to recompute all sums and averages from scratch in every iteration for an O(n^2) algorithm-- I've attached Python code for this below. I'm looking for any improvement on this approach-- most preferably, any solution below O(n^2), but an algorithm with the same complexity but better constant factors would also be helpful.

Here are a few of the things I've tried (without success):

  1. Dynamically maintaining prefix sums, for example with a Binary Indexed Tree. While I can easily update prefix sums or find a maximum prefix sum in O(log n) time, I haven't found any data structure which can update the average, as the denominator in the average is changing.
  2. Reusing the previous 'rankings' of prefix averages-- these rankings can change, e.g. in some array, the prefix ending at index 5 may have a larger average than the prefix ending at index 6, but after removing the first 3 elements, now the prefix ending at index 2 may have a smaller average than the one ending at 3.
  3. Looking for patterns in where prefixes end; for example, the rightmost element of any max average prefix is always a local maximum in the array, but it's not clear how much this helps.

This is a working Python implementation of the naive, quadratic method:

def find_array_reductions(nums: List[int]) -> List[int]:
    """Return list of lengths of max average prefix reductions."""

    def max_prefix_avg(arr: List[int]) -> Tuple[float, int]:
        """Return value and length of max average prefix in arr"""
        if len(arr) == 0:
            return (-math.inf, 0)
        
        best_length = 1
        best_average = -math.inf
        running_sum = 0.0

        for i, x in enumerate(arr, 1):
            running_sum  = x
            new_average = running_sum / i
            
            if (new_average >= best_average
                or math.isclose(new_average, best_average)):
                
                best_average = new_average
                best_length = i

        return (best_average, best_length)

    removed_lengths = []
    total_removed = 0

    while total_removed < len(nums):
        _, new_removal = max_prefix_avg(nums[total_removed:])
        removed_lengths.append(new_removal)
        total_removed  = new_removal

    return removed_lengths

CodePudding user response:

This problem has a fun O(n) solution.

If you draw a graph of cumulative sum vs index, then:

The average value in the subarray between any two indexes is the slope of the line between those points on the graph.

The first highest-average-prefix will end at the point that makes the highest angle from 0. The next highest-average-prefix must then have a smaller average, and it will end at the point that makes the highest angle from the first ending. Continuing to the end of the array, we find that...

These segments of highest average are exactly the segments in the upper convex hull of the cumulative sum graph.

Find these segments using the monotone chain algorithm. Since the points are already sorted, it takes O(n) time.

# Lengths of the segments in the upper convex hull
# of the cumulative sum graph
def upperSumHullLengths(ar):
    if (len(ar) < 2):
        if len(ar < 1):
            return []
        else:
            return [1]
    
    hull = [(0,0),(1,ar[0])]
    for x in range(2,len(ar) 1):
        # this has x coordinate x-1
        prevPoint = hull[len(hull)-1]
        # next point in cumulative sum
        point = (x, prevPoint[1] ar[x-1])
        # remove points not on the convex hull
        while len(hull) >= 2:
            p0 = hull[len(hull)-2]
            dx0 = prevPoint[0] - p0[0]
            dy0 = prevPoint[1] - p0[1]
            dx1 = x-prevPoint[0]
            dy1 = point[1] - prevPoint[1]
            if dy1*dx0 < dy0*dx1:
                break
            hull.pop()
            prevPoint = p0
        hull.append(point)
    
    return [hull[i 1][0] - hull[i][0] for i in range(0,len(hull)-1)]


print(upperSumHullLengths([  1,   7,   8,   4,   2,   1,   4]))

prints:

[3, 1, 3]

CodePudding user response:

The following algorithm has average performance of O(nlogn).

I have written expressions in Ruby but they should be easy to follow.

arr = [1, 7, 8, 4, 2, 1, 4]

Suppose we begin by constructing an array of the cumulative sums of values in arr:

a = [1, 8, 16, 20, 22, 23, 27]

The (rounded) averages of the elements of a are seen to be

[1.0, 4.0, 5.3, 5.0, 4.4, 3.8, 3.9]

For example, the average of the element of a at index 3 equals

a[3]/(3 1) #=> 20/4 #=> 5.0

Initialize two variables:

cum_so_far = 0
max_avg_indices = [-1]

Then compute the index i of a for which the average is greatest:

i = (a.size-1).downto(0).max_by { |i| (a[i]-cum_so_far).to_f/(i 1) }
  #=> 2

Update variables:

cum_so_far = a[i]
  #=> 16
a = a[i 1..-1]
  #=> [20, 22, 23, 27]
max_avg_indices << max_avg_indices.last   i   1
  #=> [-1, 2]

The (rounded) averages of the updated array a are seen to be

a.each_with_index.map { |n,i| (n-cum_so_far).to_f/(i 1) }
  #=> [4.0, 3.0, 2.3, 2.8]

We therefore may compute the index of a having the maximum average by using the same expression as before:

i = (a.size-1).downto(0).max_by { |i| (a[i]-cum_so_far).to_f/(i 1) }      
  #=> 0

Update variables:

cum_so_far = a[i]
  #=> 20
a = a[i 1..-1]
  #=> [22, 23, 27]
max_avg_indices << max_avg_indices.last   i   1
  #=> [-1, 2, 3]

Repeat:

i = (a.size-1).downto(0).max_by { |i| (a[i]-cum_so_far).to_f/(i 1) }      
  #=> 2
cum_so_far = a[i]
  #=> 27
a = a[i 1..-1]
  #=> []
max_avg_indices << max_avg_indices.last   i   1
  #=> [-1, 2, 3, 6]

Since a is empty we are finished. The desired indices are given by

max_avg_indices.drop(1)
  #=> [2, 3, 6]

The lengths of the subarrays are therefore

[2 1, 3-2, 6-3]
   #=> [3, 1, 3]
  • Related