Is there a more efficient way to compare two lists in python than O(m*n)?-CodePudding

I am trying to find a method for comparing two lists in python in a more efficient way than what I think is the current O(m*n) runtime. Right now I have a brute force approach of iterating each item in m and comparing it to n but is anything else possible? I have tried maybe sorting the lists first for maybe something faster but I am kind of stuck on whether anything else could work here.

In my function i take each item in m and compare it to n and count the number of times the item in m is greater than the item in n.

n = [1,3,7]
m = [2,9]

def comparison(n,m):
counter = 0
for i in m:
    for j in n:
        if i >= j:
            counter  = 1
            
return counter

CodePudding user response：

Here's how you could use a binary search approach after sorting the target list:

from bisect import bisect_right


n = [1,3,7,2]
m = [2,9]

n.sort()
counter = sum(bisect_right(n,value) for value in m)

print(counter) # 6

This should correspond to O((n m) x log(n)) if n is not known to be sorted. If n is always provided in sorted order, then you don't need your function to sort it and you will get O(m x log(n)) time complexity.

CodePudding user response：

I wrote a code for you to test which one runs faster using the built-in "timeit" library. You can test others' advice using the same structure. There is the code:

import timeit
import numpy as np

n = [1,3,7]
m = [9,2]


my_code = '''
def comparison(n,m):
    counter = 0
    for i in n:
        for j in m:
            if i >= j:
                counter  = 1
                
    return counter
'''



mysetup = "import numpy as np"
my_code2 = '''
def comparison_with_numpy(n,m):
    x = np.array(n)
    y = np.array(m)
    
    smaller = np.array([x[i] > y[:] for i in range(x.shape[0])]).astype('int')

    return sum(smaller)[0]

'''

my_code3 = '''
def sort_first(n,m):
    sorted(n)
    sorted(m)
    count = 0
    
    if len(n) > len(m):
        iteration = len(n)
    else:
        iteration = len(m)
    
    for _ in range(iteration):
        if n != []:
            y = n.pop(0)
        if m != []:
            x = m.pop(0)
        
        if y > x:
            count  = 1
    return count

'''

def comparison(n,m):
    counter = 0
    for i in n:
        for j in m:
            if i >= j:
                counter  = 1
                
    print(counter)
    return counter

def comparison_with_numpy(n,m):
    x = np.array(n)
    y = np.array(m)
    
    smaller = np.array([x[i] > y[:] for i in range(x.shape[0])]).astype('int')

    return sum(smaller)[0]

def sort_first(n,m):
    sorted(n)
    sorted(m)
    count = 0
    
    if len(n) > len(m):
        iteration = len(n)
    else:
        iteration = len(m)
    
    for _ in range(iteration):
        if n != []:
            y = n.pop(0)
        if m != []:
            x = m.pop(0)
        
        if y > x:
            count  = 1
    return count

def main():
    print('comparison /w sort\t\t',timeit.timeit(stmt = my_code3,number=10000))
    print('comparison\t\t',timeit.timeit(stmt = my_code,number=10000))
    print('comparison with numpy\t\t',timeit.timeit(setup = mysetup
                                                    ,stmt = my_code2
                                                    ,number=10000))



if __name__ == "__main__":
    main()