Home > database >  Optimize find and combine in list of list in Python
Optimize find and combine in list of list in Python

Time:11-13

I have a list which can contains hundreds of thousands of lists where int are stored. Let's say for example:

list = [ [0,5,9], [1,2,4], [1,2,7,4], [3,100,42] ... ]

I need to create a new list that contains all the elements where a specific element is present. For example my new_list[0] will be a flat list of all list where element 0 exists.

A dumb for-for loops would be like:

# list_ref <- my list of list
gr_cl=[]
for i in range(len(list_ref)):
    clust=[]
    for j in list_ref:
        if i in j:
            clust.append(j)
    gr_cl.append([item for sublist in clust for item in sublist]) #flat it

# set
gr_cl_set = [list(set(item)) for item in gr_cl]

I tried to implement it as list comprehension, but it still takes too much time to make my code efficient.

Any idea?

CodePudding user response:

Maybe, but the question miss a contraint to limit the output list is missing.

The code below give all sublist to the maximum of the from collections import defaultdict

from collections import defaultdict

inputlist= [ [0,5,9], [1,2,4], [1,2,7,4], [3,100,42]  ]

# create a dictionary in which :
#  keys : value of the elements of the sublists
#  values : index of the sublists of inputlist which contains the key
elt_refs = defaultdict(list)
max_value = 0
for i, sublist in enumerate(inputlist):
    for elt in sublist:
        if elt > max_value: max_value = elt
        elt_refs[elt].append(i)

# build the result by iterating on the list of the element of the dictionnary
# and filling the gaps 
result = []
result_i = 0
for k, refs in sorted(elt_refs.items()):
    # fill the gaps
    gap = k - result_i - 1
    for _ in range(gap):
        result.append([])

    result_i = k

    # flatten refs
    flat = []
    for ref in refs:
        flat.extend(inputlist[ref])

    result.append(list(set(flat)))

print(result)

CodePudding user response:

Why not put the lists in dicts? So they you say my_dict[#] is list of indices in the original list containing #. Likely faster and you can still get the output you want (a list of indices).

CodePudding user response:

You were pretty close to implementing it as a list comprehension, actually. This is the modified version:

# sample numbers
list_of_nums = [[0,5,9], [1,2,4], [1,2,7,4], [3,100,42], [0,1,3], [1,2,3,4], [10,0,9]]
contains_zero = [sublist for sublist in list_of_nums if 0 in sublist]
print(contains_zero)  # outputs: `[[0, 5, 9], [0, 1, 3], [10, 0, 9]]`
  • Related