Python: Calculate total number of comparisons to find a match-CodePudding

I have two arrays. One with 10 Indexes, One with 2 Indexes.

I want to check if the large array has the exact values of the small array.

There is a total of 9 comparisons that need to be made.

How do I calculate this value for arrays of different sizes?

I need this value to manipulate control flow.

largeArr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
smallArr = [9, 10]

On the 9th Comparison it will be true.

CodePudding user response：

The brute-force check will take up to len(largeArr) - len(smallArr) 1 comparisons, each of size up to len(smallArr). It will take that many if it's not found. If found, it might take half of that on average, but that depends on the statistics of their entries. So this is O(n), where n = len(largeArr).

However, if largeArr is sorted as your example shows, it would be much more efficient to do a binary search for smallArr[0]. That would make checking be O(log(n)).

Another approach which would be much faster if you want to check many different smallArr against a given largeArr: generate a hash of each consecutive slice of length n = len(smallArr) taken from largeArr, and put those hashes in a set or dict. Then you can very quickly check if a particular smallArr is present by computing its hash and checking for membership in the pre-computed set or dict.

Here's an example of this latter approach:

largeArr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
smallArr = [9, 10]
n = len(smallArr)
match = set()
for i in range(0, len(largeArr) - n   1):
    match.add(tuple(largeArr[i:i n]))
print(tuple(smallArr) in match)

This uses tuples since they are immutable, unlike slices. Checking is now close to O(1), or at least as quick as a set can test membership (which will actually grow slowly with n depending on the implementation).

CodePudding user response：

Here is another solution. The above solution is perfect, my solution just happens to run in constant space and linear time complexity.

That is;

Time: O(N)

Space: O(1)

from typing import List  # for types annotation


# You can solve this in a linear fashion like this...
def mapable(universe: List[int], elements: List[int], from_indx: int) -> bool:
  # tries to address worst case
  last_mapping_indx: int = from_indx   (len(elements) - 1)
  if last_mapping_indx >= len(universe) or not(elements[-1] == universe[last_mapping_indx]):
    return False

  # why use a loop? using a loop is more dynamic, in case elements can change in size
  # tries to match a subset in a set
  for num in elements:
    if from_indx >= len(universe) or not (num == universe[from_indx]):
      return False
    from_indx  = 1
  return True

# T = typedVar('T')   
# you can find a creative way to use zip(itr[T], itr[T]) here to achieve the same

def a_in_b(larger: List[int], smaller: List[int]) -> bool:
  for indx, num in enumerate(larger):
    if num == smaller[0]:
      if mapable(larger, smaller, indx):
        return True
        # return indx   (len(smaller))  # this is the solution if you only care about how many comparison were made
  return False


# this code will check that a list is found in a larger one-dimentional list in linear faction. If you look at the helper-method(mapable) the worst case scenario would be the following

#larger: [8, 8, 8, 8, 8, 8, 8, 8, 8, 9]
#smaller: [8, 9]

# where it tries to iterate through smaller n-1 times. This would drop our complexity from O(N) to O(n * m) where m = len(smaller). Hence why we have an if statement at the beginning of mapable.

largeArr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
smallArr = [8, 9, 10]
print(a_in_b(largeArr, smallArr))  # True

CodePudding user response：

As you have the numpy tag, use a numpy approach:

from numpy.lib.stride_tricks import sliding_window_view as swv

largeArr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
smallArr = [9, 10]

out = (swv(largeArr, len(smallArr)) == smallArr).any()
# True

Intermediate:

swv(largeArr, len(smallArr))

array([[ 1,  2],
       [ 2,  3],
       [ 3,  4],
       [ 4,  5],
       [ 5,  6],
       [ 6,  7],
       [ 7,  8],
       [ 8,  9],
       [ 9, 10]])

repeated comparisons

If many comparisons need to be done:

from numpy.lib.stride_tricks import sliding_window_view as swv

existing = set(map(tuple, swv(largeArr, len(smallArr))))
tuple(smallArr) in existing
# True

tuple([12, 4]) in existing
# False