Home > Net >  python list of lists to dict when key appear many times
python list of lists to dict when key appear many times

Time:03-18

I know to write something simple and slow with loop, but I need it to run super fast in big scale.

input:

lst = [[1, 1, 2], ["txt1", "txt2", "txt3"]]

desired out put:

d = {1 : ["txt1", "txt2"], 2 : "txt3"]

There is something built-in at python which make dict() extend key instead replacing it?

dict(list(zip(lst[0], lst[1])))

CodePudding user response:

One option is to use dict.setdefault:

out = {}
for k, v in zip(*lst):
    out.setdefault(k, []).append(v)

Output:

{1: ['txt1', 'txt2'], 2: ['txt3']}

If you want the element itself for singleton lists, one way is adding a condition that checks for it while you build an output dictionary:

out = {}
for k,v in zip(*lst):
    if k in out: 
        if isinstance(out[k], list):
            out[k].append(v)
        else:
            out[k] = [out[k], v]
    else:
        out[k] = v

or if lst[0] is sorted (like it is in your sample), you could use itertools.groupby:

from itertools import groupby
out = {}
pos = 0
for k, v in groupby(lst[0]):
    length = len([*v])
    if length > 1:
        out[k] = lst[1][pos:pos length]
    else:
        out[k] = lst[1][pos]
    pos  = length

Output:

{1: ['txt1', 'txt2'], 2: 'txt3'}

But as @timgeb notes, it's probably not something you want because afterwards, you'll have to check for data type each time you access this dictionary (if value is a list or not), which is an unnecessary problem that you could avoid by having all values as lists.

CodePudding user response:

If you're dealing with large datasets it may be useful to add a pandas solution.

>>> import pandas as pd
>>> lst = [[1, 1, 2], ["txt1", "txt2", "txt3"]]
>>> s = pd.Series(lst[1], index=lst[0])
>>> s 
1    txt1
1    txt2
2    txt3
>>> s.groupby(level=0).apply(list).to_dict()
{1: ['txt1', 'txt2'], 2: ['txt3']}

Note that this also produces lists for single elements (e.g. ['txt3']) which I highly recommend. Having both lists and strings as possible values will result in bugs because both of those types are iterable. You'd need to remember to check the type each time you process a dict-value.

CodePudding user response:

You can use a defaultdict to group the strings by their corresponding key, then make a second pass through the list to extract the strings from singleton lists. Regardless of what you do, you'll need to access every element in both lists at least once, so some iteration structure is necessary (and even if you don't explicitly use iteration, whatever you use will almost definitely use iteration under the hood):

from collections import defaultdict

lst = [[1, 1, 2], ["txt1", "txt2", "txt3"]]

result = defaultdict(list)
for key, value in zip(lst[0], lst[1]):
    result[key].append(value)

for key in result:
    if len(result[key]) == 1:
        result[key] = result[key][0]

print(dict(result)) # Prints {1: ['txt1', 'txt2'], 2: 'txt3'}
  • Related