I have two arrays. One with 10 Indexes, One with 2 Indexes.
I want to check if the large array has the exact values of the small array.
There is a total of 9 comparisons that need to be made.
How do I calculate this value for arrays of different sizes?
I need this value to manipulate control flow.
largeArr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
smallArr = [9, 10]
On the 9th Comparison it will be true.
CodePudding user response:
The brute-force check will take up to len(largeArr) - len(smallArr) 1
comparisons, each of size up to len(smallArr)
. It will take that many if it's not found. If found, it might take half of that on average, but that depends on the statistics of their entries. So this is O(n), where n = len(largeArr)
.
However, if largeArr
is sorted as your example shows, it would be much more efficient to do a binary search for smallArr[0]
. That would make checking be O(log(n))
.
Another approach which would be much faster if you want to check many different smallArr
against a given largeArr
: generate a hash of each consecutive slice of length n = len(smallArr)
taken from largeArr
, and put those hashes in a set
or dict
. Then you can very quickly check if a particular smallArr
is present by computing its hash and checking for membership in the pre-computed set
or dict
.
Here's an example of this latter approach:
largeArr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
smallArr = [9, 10]
n = len(smallArr)
match = set()
for i in range(0, len(largeArr) - n 1):
match.add(tuple(largeArr[i:i n]))
print(tuple(smallArr) in match)
This uses tuples since they are immutable, unlike slices. Checking is now close to O(1), or at least as quick as a set
can test membership (which will actually grow slowly with n
depending on the implementation).
CodePudding user response:
Here is another solution. The above solution is perfect, my solution just happens to run in constant space and linear time complexity.
That is;
Time: O(N)
Space: O(1)
from typing import List # for types annotation
# You can solve this in a linear fashion like this...
def mapable(universe: List[int], elements: List[int], from_indx: int) -> bool:
# tries to address worst case
last_mapping_indx: int = from_indx (len(elements) - 1)
if last_mapping_indx >= len(universe) or not(elements[-1] == universe[last_mapping_indx]):
return False
# why use a loop? using a loop is more dynamic, in case elements can change in size
# tries to match a subset in a set
for num in elements:
if from_indx >= len(universe) or not (num == universe[from_indx]):
return False
from_indx = 1
return True
# T = typedVar('T')
# you can find a creative way to use zip(itr[T], itr[T]) here to achieve the same
def a_in_b(larger: List[int], smaller: List[int]) -> bool:
for indx, num in enumerate(larger):
if num == smaller[0]:
if mapable(larger, smaller, indx):
return True
# return indx (len(smaller)) # this is the solution if you only care about how many comparison were made
return False
# this code will check that a list is found in a larger one-dimentional list in linear faction. If you look at the helper-method(mapable) the worst case scenario would be the following
#larger: [8, 8, 8, 8, 8, 8, 8, 8, 8, 9]
#smaller: [8, 9]
# where it tries to iterate through smaller n-1 times. This would drop our complexity from O(N) to O(n * m) where m = len(smaller). Hence why we have an if statement at the beginning of mapable.
largeArr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
smallArr = [8, 9, 10]
print(a_in_b(largeArr, smallArr)) # True
CodePudding user response:
As you have the numpy tag, use a numpy approach:
from numpy.lib.stride_tricks import sliding_window_view as swv
largeArr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
smallArr = [9, 10]
out = (swv(largeArr, len(smallArr)) == smallArr).any()
# True
Intermediate:
swv(largeArr, len(smallArr))
array([[ 1, 2],
[ 2, 3],
[ 3, 4],
[ 4, 5],
[ 5, 6],
[ 6, 7],
[ 7, 8],
[ 8, 9],
[ 9, 10]])
repeated comparisons
If many comparisons need to be done:
from numpy.lib.stride_tricks import sliding_window_view as swv
existing = set(map(tuple, swv(largeArr, len(smallArr))))
tuple(smallArr) in existing
# True
tuple([12, 4]) in existing
# False