dictionary group by substring of key-CodePudding

In Python3, I have a dictionnary {k = episode : value = count} and I can't figure out how to group by substring of keys where values are summed.

input:

dict = {'S01E01': 27, 'S01E02': 27, 'S01E03': 32, 'S01E04': 36, 'S01E05': 35, 'S01E06': 31,
 'S02E01': 33, 'S02E02': 21, 'S02E03': 20, 'S02E04': 29, 'S02E05': 33, 'S02E06': 42}

Wanted ouput:

output_dict = {'S01': 188 , 'S02' : 178}

I've tried building an intermediary list of seasons and tried to use reduce & counter functions with no success.

List = ['S01', 'S02']

Also tried looking for any results in here but couldn't find anything. Wrong terminology probably. Any help would be appreciated. Thanks

CodePudding user response：

The answer by Onyambu is probably the more pythonic way to solve this problem, but if you're looking for a more human readable solution that fits this specific use case then you can do something like this:

episodes = {'S01E01': 27, 'S01E02': 27, 'S01E03': 32, 'S01E04': 36, 'S01E05': 35, 'S01E06': 31,
 'S02E01': 33, 'S02E02': 21, 'S02E03': 20, 'S02E04': 29, 'S02E05': 33, 'S02E06': 42}

output = {}

for episode in episodes:
    season = episode[0:3] #Gets the first 3 characters
    if season not in output:
        output[season] = episodes[episode] 
    else:
        output[season]  = episodes[episode]
print(output)

CodePudding user response：

Use dict comprehension:

 from itertools import groupby

 {key:sum(list(zip(*val))[1]) for key, val in groupby(d.items(), key = lambda x:x[0][:3])}
  Out: {'S01': 188, 'S02': 178}

Using the normal for loop first save your data as d. then delete dict since its an internal function ie del dict. Now you can run the following code

result = dict()

for key, val in d.items():
    var1 = key[:3]
    if not result.get(var1):
        result[var1] = 0
    result[var1]   = val

CodePudding user response：

I am assuming the subkey is only 3 characters long.

dic = {'S01E01': 27, 'S01E02': 27, 'S01E03': 32, 'S01E04': 36, 'S01E05': 35, 'S01E06': 31,
 'S02E01': 33, 'S02E02': 21, 'S02E03': 20, 'S02E04': 29, 'S02E05': 33, 'S02E06': 42}

First extract unique subkeys:

subkeys = set([key[:3] for key in dic.keys()])

Afterwards, use a dictionary comprehension to sum up values for each subkey.

out = {subkey: sum([value for key, value in dic.items() if subkey in key]) for subkey in subkeys}

Uglier one-liner:

out = {subkey[:3]: sum([value for key, value in dic.items() if subkey[:3] in key]) for subkey in dic.keys()}

CodePudding user response：

Another approach:

data = {'S01E01': 27, 'S01E02': 27, 'S01E03': 32, 'S01E04': 36, 'S01E05': 35, 'S01E06': 31,
 'S02E01': 33, 'S02E02': 21, 'S02E03': 20, 'S02E04': 29, 'S02E05': 33, 'S02E06': 42}
from itertools import groupby
out = {}
for key, value in groupby(data, lambda x:x[:3]):
    out[key] = sum([data[val] for val in list(value)])
print (out)

Output:

{'S01': 188, 'S02': 178}