Home > OS >  How to get common elements in a deep nested list: my two solutions work but take some time
How to get common elements in a deep nested list: my two solutions work but take some time

Time:04-15

I have a nested list structure as below. Each of the 4 nested structures represents some free positions for me. I want to find which elements are present in all 4 nested lists.

ary=[   [[0, 4], [5, 11]],    [[0, 2], [0, 4], [5,10]],  [[0, 4], [0, 14], [5,11]],  [[0, 4], [0, 14], [5,11]]  ]

As in above, in the first nested list [[0, 4], [5, 11]], the [0,4] is present in all but [5,11] is not. Hence, my answer should be [[0,4]] or even just [0,4].

I did this in two ways. Solution1:

 ary1=ary
 newlist = [item for items in ary for item in items]
 x=[i for i in ary[0] if newlist.count(i)== len(ary1)]  

 #OUTPUT is x= [[0,4]]

Solution2:

x=[]    
for u in ary[0]:
    n=[]        
    n=[1 for t in range(1,len(ary)) if u in ary[t]] 
    if len(ary)-1==len(n):  
       x.append(u)

#OUTPUT is x= [[0,4]]

These two seem to take similar computational time when checked using line profiler. And this is the only point of heavy computation in my hundreds of liens of code and I want to reduce this. So, can you suggest any other Python commands/code that can do the task better than these two solutions?

CodePudding user response:

You can try to convert each nested array at the second level into the set of tuples, where each lowest level array (i.e. [0,4]) is an element of the set. The conversion into tuples is required because lists are not hashable. Once you have each nested list of lists as a set, simply find their intersection.

set.intersection(*[set(tuple(elem) for elem in sublist) for sublist in ary])

CodePudding user response:

I'm not suggesting to do it like this, I just came up with another way so you could compare it. Maybe it is a lucky shot and need less computation than your topical ways.

flatten_list = [tuple(item) for sublist in ary for item in sublist]
max(set(flatten_list), key = flatten_list.count)

This requires that there is always one element which is present in every nested list because I don't check it explicitly.

CodePudding user response:

Two approaches that I can think of

  1. Depending on how intensive you are willing to take this and how big the target list is going to be, it may need a little bit of time and testing. If so, consider option 2. But my theory is to turn it into a binary based approach as opposed to iterating thru each of the direct elements of the outer array. May need the use of itertools.tee() and/or multithreading with a recursive function (depending on the length of the outer list). The recursive function will split the list by half in every iteration, until it is determined splits are small enough in length to start ruling out uncommon elements (like [5-11]). Then common elements are passed back up the recursion hierarchy. Designing this more closely should help assess conditions/threshold to avoid runaway conditions like excessive thread counts

  2. It seems that all third level lists (e.g., [[0, 2], [0, 4], [5,10]]) are sorted. If not, then sort them, eliminate (pop) duplicates, and then merge them all together using operator, and then resort. After that you will end up with a structure containing as many [0,4]'s as the length of ary1. That could be your condition for identifying [0,4] as the answer. That again may need to be tested

  • Related