Save list number within a list only if it contains elements in python-CodePudding

I have list of lists such as :

my_list_of_list=[['A','B','C','E'],['A','B','C','E','F'],['D','G','A'],['X','Z'],['D','M'],['B','G'],['X','Z']]

as you can see, the list 1 and 2 share the most elements (4). So, I keep a list within my_list_of_list only if the 4 shared elements (A,B,C or E) are present within that list.

Here I then save within the list_shared_number[], only the lists 1,2,3 and 6 since the other does not contain (A,B,C or E).

Expected output:

print(list_shared_number)
[0,1,2,5]

CodePudding user response：

Probably sub optimal because I need to iterate 3 times over lists but it's the expect result:

from itertools import combinations
from functools import reduce

common_elements = [set(i).intersection(j) 
                       for i, j in combinations(my_list_of_list, r=2)]

common_element = reduce(lambda i, j: i if len(i) >= len(j) else j, common_elements)

list_shared_number = [idx for idx, l in enumerate(my_list_of_list)
                          if common_element.intersection(l)]
print(list_shared_number)

# Output
[0, 1, 2, 5]

Alternative with 2 iterations:

common_element = {}
for i, j in combinations(my_list_of_list, r=2):
    c = set(i).intersection(j)
    common_element = c if len(c) > len(common_element) else common_element
list_shared_number = [idx for idx, l in enumerate(my_list_of_list)
                          if common_element.intersection(l)]
print(list_shared_number)

# Output
[0, 1, 2, 5]

CodePudding user response：

You can use itertools.combinations and set operations.

In the first line, you find the intersection that is the longest among pairs of lists. In the second line, you iterate over my_list_of_list to identify the lists that contain elements from the set you found in the first line.

from itertools import combinations
comparison = max(map(lambda x: (len(set(x[0]).intersection(x[1])), set(x[0]).intersection(x[1])), combinations(my_list_of_list, 2)))[1]
out = [i for i, lst in enumerate(my_list_of_list) if comparison - set(lst) != comparison]

Output:

[0, 1, 2, 5]

CodePudding user response：

You can find shared elements by using list comprehension. Checking if index 0 and index 1:

share = [x for x in my_list_of_list[0] if x in my_list_of_list[1]]
print(share)

Assume j is each item so [j for j in x if j in share] can find shared inner elements. if the length of this array is more than 0 so it should include in the output.

So final code is like this:

share = [x for x in my_list_of_list[0] if x in my_list_of_list[1]]
my_list = [i for i, x in enumerate(my_list_of_list) if len([j for j in x if j in share]) > 0]
print(my_list)

CodePudding user response：

Oh boy, so mine is a bit messy, however I did not use any imports AND I included the initial "finding" of the two lists which have the most in common with one another. This can easily be optimised but it does do exactly what you wanted.

my_list_of_list=[['A','B','C','E'],['A','B','C','E','F'],['D','G','A'],['X','Z'],['D','M'],['B','G'],['X','Z']]
my_list_of_list = list(map(set,my_list_of_list))
mostIntersects = [0, (None,)]

for i, IndSet in enumerate(my_list_of_list):
    for j in range(i 1,len(my_list_of_list)):
        intersects = len(IndSet.intersection(my_list_of_list[j]))
        if intersects > mostIntersects[0]: mostIntersects = [intersects, (i,j)]
FinalIntersection = set(my_list_of_list[mostIntersects[1][0]]).intersection(my_list_of_list[mostIntersects[1][1]])

skipIndexes = set(mostIntersects[1])
for i,sub_list in enumerate(my_list_of_list): 
    [skipIndexes.add(i) for char in sub_list 
        if i not in skipIndexes and char in FinalIntersection]

print(*map(list,(mostIntersects, FinalIntersection, skipIndexes)), sep = '\n')

The print provides this :

[4, (0, 1)]
['E', 'C', 'B', 'A']
[0, 1, 2, 5]

This works by first converting the lists to sets using the map function (it has to be turned back into a list so i can use len and iterate properly) I then intersect each list with the others in the list of lists and count how many elements are in each. Each time i find one with a larger number, i set mostIntersections equal to the len and the set indexes. Once i go through them all, i get the lists at the two indexes (0 and 1 in this case) and intersect them to give a list of elements [A,B,C,E] (var:finalIntersection). From there, i just iterate over all lists which are not already being used and just check if any of the elements are found in finalIntersection. If one is, the index of the list is appended to skipIndexes. This results in the final list of indexes {indices? idk} that you were after. Technically the result is a set, but to convert it back you can just use list({0,1,2,5}) which will give you the value you were after.