Looking for an algorithm to find the beginning and end of a set of data in the middle of an array of-CodePudding

Say I have an array of 5000 elements. Somewhere in the middle of the array, say from a[4200] through a[4300], is data. Values outside of this range return null.

What is the most efficient way to find the first and last entry containing data, while querying the array as few times as possible? Is there a name for what I'm trying to do?

CodePudding user response：

Algorithm:

iterate through array using iterator j
while data[j] = '\0', keep going
condition breaks, set start = that position.
another while, this time, while data[j] != '\0'
once that breaks, set end = that position. minus one if you don't want it to be at a null.

You're done.

As for the "querying the array bit," I imagine it would be rather difficult to not do it this way, as the nulls could hypothetically end anywhere.

CodePudding user response：

If you know that there are k non-null items in the middle of an array of length n, then you may need to make n/k checks before you find a data element, just by checking every kth one. After that you can do a binary search to find the ends.

If you don't know k, then you can recursively divide the range into smaller and smaller regions, in breadth-first order, by checking the middle of each one. This will still take O(n/k) checks in the worst case. The binary searches to find the ends then take O(log k), for O(n/k log k) all together.

Here it is in python:

def findData(A):
    if (len(A)<1):
        return None
    q = [(0,len(A))]
    pos = 0
    while pos < len(q):
        (s,e) = q[pos]
        pos  = 1
        mid = s   (e-s)//2
        if A[mid] == None:
            # Didn't find data. Subdivide range
            if mid > s:
                q.append((s,mid))
            if mid 1 < e:
                q.append((mid 1,e))
            continue

        # Found data
        maxs = mid # max start pos
        mine = mid 1 # min end pos

        # Binary search to find start
        while s < maxs:
            mid = s   (maxs-s)//2
            if A[mid] == None:
                s = mid 1
            else:
                maxs = mid
        
        # Binary search to find end
        while mine < e:
            mid = mine   (e-mine)//2
            if A[mid] == None:
                e = mid
            else:
                mine = mid 1
        
        return(s,e)
    
    # Searched the whole array and found no data
    return None

Test:

$> print(findData([
    None,None,None,1,2,3,4,5,None
]))
(3,8)