Home > OS >  dictionary group by substring of key
dictionary group by substring of key

Time:11-02

In Python3, I have a dictionnary {k = episode : value = count} and I can't figure out how to group by substring of keys where values are summed.

input:

dict = {'S01E01': 27, 'S01E02': 27, 'S01E03': 32, 'S01E04': 36, 'S01E05': 35, 'S01E06': 31,
 'S02E01': 33, 'S02E02': 21, 'S02E03': 20, 'S02E04': 29, 'S02E05': 33, 'S02E06': 42}

Wanted ouput:

output_dict = {'S01': 188 , 'S02' : 178}

I've tried building an intermediary list of seasons and tried to use reduce & counter functions with no success.

List = ['S01', 'S02']

Also tried looking for any results in here but couldn't find anything. Wrong terminology probably. Any help would be appreciated. Thanks

CodePudding user response:

The answer by Onyambu is probably the more pythonic way to solve this problem, but if you're looking for a more human readable solution that fits this specific use case then you can do something like this:

episodes = {'S01E01': 27, 'S01E02': 27, 'S01E03': 32, 'S01E04': 36, 'S01E05': 35, 'S01E06': 31,
 'S02E01': 33, 'S02E02': 21, 'S02E03': 20, 'S02E04': 29, 'S02E05': 33, 'S02E06': 42}

output = {}

for episode in episodes:
    season = episode[0:3] #Gets the first 3 characters
    if season not in output:
        output[season] = episodes[episode] 
    else:
        output[season]  = episodes[episode]
print(output)

CodePudding user response:

Use dict comprehension:

 from itertools import groupby

 {key:sum(list(zip(*val))[1]) for key, val in groupby(d.items(), key = lambda x:x[0][:3])}
  Out: {'S01': 188, 'S02': 178}

Using the normal for loop first save your data as d. then delete dict since its an internal function ie del dict. Now you can run the following code

result = dict()

for key, val in d.items():
    var1 = key[:3]
    if not result.get(var1):
        result[var1] = 0
    result[var1]   = val

CodePudding user response:

I am assuming the subkey is only 3 characters long.

dic = {'S01E01': 27, 'S01E02': 27, 'S01E03': 32, 'S01E04': 36, 'S01E05': 35, 'S01E06': 31,
 'S02E01': 33, 'S02E02': 21, 'S02E03': 20, 'S02E04': 29, 'S02E05': 33, 'S02E06': 42}

First extract unique subkeys:

subkeys = set([key[:3] for key in dic.keys()])

Afterwards, use a dictionary comprehension to sum up values for each subkey.

out = {subkey: sum([value for key, value in dic.items() if subkey in key]) for subkey in subkeys}

Uglier one-liner:

out = {subkey[:3]: sum([value for key, value in dic.items() if subkey[:3] in key]) for subkey in dic.keys()}

CodePudding user response:

Another approach:

data = {'S01E01': 27, 'S01E02': 27, 'S01E03': 32, 'S01E04': 36, 'S01E05': 35, 'S01E06': 31,
 'S02E01': 33, 'S02E02': 21, 'S02E03': 20, 'S02E04': 29, 'S02E05': 33, 'S02E06': 42}
from itertools import groupby
out = {}
for key, value in groupby(data, lambda x:x[:3]):
    out[key] = sum([data[val] for val in list(value)])
print (out)

Output:

{'S01': 188, 'S02': 178}
  • Related