How to check if row is contained in another row?-CodePudding

The output of the function is to remove all lines (including list1 after the text) whose first n strings are contained in another line.

The code I tried is terribly written and super slow, but I have no other idea about how to do this. Is there an easier and faster way to do this?

The order of each element in the line is important.

    list1 = [["0","0","0","0","0"],
["0","0","0","0","0","0"],
["0","0","0","0","0","275"],
["0","0","0","0","0","275","275"],
["0","0","0","0","275"],
["0","0","0","0","275","275"],
["0","0","0","0","275","990"],
["0","0","0","0","275","990","990"],
["0","0","0","0","275","990","2761"],
["0","0","0","0","275","990","2761","2761"],
["0","0","0","0","688"],
["0","0","0","0","688","688"],
["0","0","0","0","688","1940"],
["0","0","0","0","688","1940","1940"],
["0","0","0","0","688","1940","5041"],
["0","0","0","0","688","1940","5041","5041"],
["0","0","0","165","165","165"],
["0","0","0","165","165","165","165"]]

remove_lines = []
index_to_be_removed = []
for x in range(len(list1)):
    for y in range(len(list1)):
        if len(list1[x]) > len(list1[y]):
            c = 0
            for i in range(len(list1[y])):
                if list1[x][i] == list1[y][i]:
                    c = c   1
                    if c == len(list1[y]):
                        if list1[x] not in remove_lines:
                            remove_lines.append(list1[x])
                            index_to_be_removed.append(x)

for x in range(len(index_to_be_removed)):
    print("lines to be removed:", list1[index_to_be_removed[x]])

Output of the lines which we want to remove:

lines to be removed: ['0', '0', '0', '0', '0', '0']
lines to be removed: ['0', '0', '0', '0', '0', '275']
lines to be removed: ['0', '0', '0', '0', '0', '275', '275']
lines to be removed: ['0', '0', '0', '0', '275', '275']
lines to be removed: ['0', '0', '0', '0', '275', '990']
lines to be removed: ['0', '0', '0', '0', '275', '990', '990']
lines to be removed: ['0', '0', '0', '0', '275', '990', '2761']
lines to be removed: ['0', '0', '0', '0', '275', '990', '2761', '2761']
lines to be removed: ['0', '0', '0', '0', '688', '688']
lines to be removed: ['0', '0', '0', '0', '688', '1940']
lines to be removed: ['0', '0', '0', '0', '688', '1940', '1940']
lines to be removed: ['0', '0', '0', '0', '688', '1940', '5041']
lines to be removed: ['0', '0', '0', '0', '688', '1940', '5041', '5041']
lines to be removed: ['0', '0', '0', '165', '165', '165', '165']

CodePudding user response：

this code is expected to be versatile and Fast.

Lists = [["0","0","0","0","0"], ["0","0","0","0","0","0"], ["0","0","0","0","0","275"], ["0","0","0","0","0","275","275"], ["0","0","0","0","275"], ["0","0","0","0","275","275"], ["0","0","0","0","275","990"], ["0","0","0","0","275","990","990"], ["0","0","0","0","275","990","2761"], ["0","0","0","0","275","990","2761","2761"], ["0","0","0","0","688"], ["0","0","0","0","688","688"], ["0","0","0","0","688","1940"], ["0","0","0","0","688","1940","1940"], ["0","0","0","0","688","1940","5041"], ["0","0","0","0","688","1940","5041","5041"], ["0","0","0","165","165","165"], ["0","0","0","165","165","165","165"]]


def Get_Uniques(Lists):
    Selected, Rejected = [], []
    for List in Lists :
        lenL, selected = len(List), True
        
        for ele in Selected:
            lenE = len(ele)
            if lenE > lenL: continue 
            if List[:lenE] == ele:
                Rejected.append(List)
                selected = False 
                break 
        if selected: Selected.append(List)
    
    return Selected, Rejected


Selected, Rejected = Get_Uniques(Lists)
print("Selected Lists : ", *Selected, "\nRejected Lists : ", *Rejected, sep="\n")

OUTPUT

Selected Lists : 
['0', '0', '0', '0', '0']
['0', '0', '0', '0', '275']
['0', '0', '0', '0', '688']
['0', '0', '0', '165', '165', '165']

Rejected Lists : 
['0', '0', '0', '0', '0', '0']
['0', '0', '0', '0', '0', '275']
['0', '0', '0', '0', '0', '275', '275']
['0', '0', '0', '0', '275', '275']
['0', '0', '0', '0', '275', '990']
['0', '0', '0', '0', '275', '990', '990']
['0', '0', '0', '0', '275', '990', '2761']
['0', '0', '0', '0', '275', '990', '2761', '2761']
['0', '0', '0', '0', '688', '688']
['0', '0', '0', '0', '688', '1940']
['0', '0', '0', '0', '688', '1940', '1940']
['0', '0', '0', '0', '688', '1940', '5041']
['0', '0', '0', '0', '688', '1940', '5041', '5041']
['0', '0', '0', '165', '165', '165', '165']

Hope this helps!

CodePudding user response：

I would do this through a pairwise comparison while loop. Basically, I first store all pairs of lists, (n lists becomes 0.5*n*(n-1) pairs of lists). Then, one at a time, I look at each pair, find the smaller list, and check if one is contained in the other using delimited-string comparison. Here's the code:

from itertools import combinations
lists = [["0","0","165","165"],...]
pairs = combinations(lists)
for pair in pairs:
    smaller = pair[0] if len(pair[0]) <= len(pair[1]) else pair[1]
    bigger = pair[1] if smaller == pair[0] else pair[0]
    smaller_str = ",".join(smaller)
    bigger_str = ",".join(bigger)
    if smaller_str in bigger_str and smaller in lists:
        lists.remove(smaller)
print(lists)

This ought to leave only those arrays which are not contained in any other arrays. I don't know if it's necessarily more efficient than your algorithm when you break down its component operations, but it's definitely a bit easier to follow.

CodePudding user response：

You could use a dictionary of tuples for the lists that you keep and use it to check the relevant prefix lengths of subsequent lists. This will give results in O(N) time, or more specifically O(NxD) where N is the number of lists and D is the length difference between the largest and smallest list.

keep   = dict()               # lists to keep (as tuples)
minLen = min(map(len,list1))  # shortest prefix length
for i,L in enumerate(list1): 
    prefixes = (tuple(L[:n]) for n in range(minLen,len(L) 1))
    if not any(sl in keep for sl in prefixes): # check all prefix lengths
        keep[tuple(L)]=i           # track lists that remain
    else:
        print("list to remove",L)  # list at index i will be dropped

# clean up
list1 = [*map(list,keep)]

Output:

list to remove ['0', '0', '0', '0', '0', '0']
list to remove ['0', '0', '0', '0', '0', '275']
list to remove ['0', '0', '0', '0', '0', '275', '275']
list to remove ['0', '0', '0', '0', '275', '275']
list to remove ['0', '0', '0', '0', '275', '990']
list to remove ['0', '0', '0', '0', '275', '990', '990']
list to remove ['0', '0', '0', '0', '275', '990', '2761']
list to remove ['0', '0', '0', '0', '275', '990', '2761', '2761']
list to remove ['0', '0', '0', '0', '688', '688']
list to remove ['0', '0', '0', '0', '688', '1940']
list to remove ['0', '0', '0', '0', '688', '1940', '1940']
list to remove ['0', '0', '0', '0', '688', '1940', '5041']
list to remove ['0', '0', '0', '0', '688', '1940', '5041', '5041']
list to remove ['0', '0', '0', '165', '165', '165', '165']


print(list1)
[['0', '0', '0', '0', '0'], 
 ['0', '0', '0', '0', '275'], 
 ['0', '0', '0', '0', '688'], 
 ['0', '0', '0', '165', '165', '165']]