When making comparison between the elements in a list, how to efficiently iterate and improve the ti-CodePudding

I have a list where I would like to compare each element of the list with each other. I know we can do that using a nested loop but the time complexity is O(n^2). Is there any option to improve the time complexity and make the comparisons efficient?

For example:

I have a list where I would like to find the difference in digits among each element. Consider a list array=[100,110,010,011,100] where I am trying to find the difference in the digits among each integer. array[0] is same as array[4] (i.e 100 and 100), while array[0] has 1 digit that is different from array[1] (i.e 100 and 110) and array[0] has 3 digits that are different from array[3] (i.e 100 and 011). Assuming similar integers are defined as integers that have either identical or the difference in digits is just 1, I would like to return a list as output, where every element denotes the integers with similar digits (i.e difference in digits <=1).

For the input list array=[100,110,010,011,100], my expected output should be [2,3,2,1,2]. In the output list, the output[0] indicates that array[0] is similar to array[1] and array[4] (i.e similar to 100 , we have 2 other integers 110,100 in the list)

This is my code that works, though very inefficient O(n^2):

def diff(a,b):
    difference= [i for i in range(len(a)) if a[i]!=b[i]]
    return len(difference)

def find_similarity_int(array):
    # write your code in Python 3.6
    res=[0]*len(array)
    string=[]
    for n in array:
        string.append(str(n))
    for i in range(0,len(string)):
        for j in range(i 1,len(string)):
            count=diff(string[i],string[j])
            if(count<=1):
                res[i]=res[i] 1
                res[j]=res[j] 1

    return res

input_list=['100','110','010','011','100']
output=find_similarity_int(input_list)
print("The similarity metrics for the given list is : ",output)

Output:

The similarity metrics for the given list is : [2, 3, 2, 1, 2]

Could anyone please suggest an efficient way to make the comparison, preferably with just 1 loop? Thanks!

CodePudding user response：

If the values are binary digits only, you can get a O(nxm) solution (where m is the width of the values) using a multiset (Counter from collections). With the count of values in the multiset, add the counts of items that correspond to exactly one bit change in each number (plus the number of duplicates):

from collections import Counter
def simCount(L):
    counts = Counter(L)  # multiset of distinct values / count
    result = []
    for n in L:
        r = counts[n]-1                              # duplicates
        for i,b in enumerate(n):                     # 1 bit changes
            r  = counts[n[:i] "01"[b=="0"] n[i 1:]]  # count others
        result.append(r)                             # sum of similars
    return result

Output:

A = ['100','110','010','011','100']

print(simCount(A)) # [2, 3, 2, 1, 2]

To avoid the string manipulations on every item, you can convert them to integers and use bitwise operators to make the 1-bit changes:

from collections import Counter
def simCount(L):
    bits   = [1<<i for i in range(len(L[0]))] # bit masks
    L      = [int(n,2) for n in L]            # numeric values
    counts = Counter(L)                       # multiset n:count
    result = []
    for n in L:
        result.append(counts[n]-1)            # duplicates
        for b in bits:                        # 1 bit changes
            result[-1]  = counts[b^n]         # sum similars
    return result

A = ['100','110','010','011','100']

print(simCount(A)) # [2, 3, 2, 1, 2]