Home > database >  Python list/dict comprehension summing a dict list key by another key in the same dict
Python list/dict comprehension summing a dict list key by another key in the same dict

Time:05-28

Been thinking how to convert this to a one liner if possible:

activities = 
[ {'type': 'Run', 'distance': 12345, 'other_stuff': other ...},                   
  {'type': 'Ride', 'distance': 12345, 'other_stuff': other ...},  
  {'type': 'Swim', 'distance': 12345, 'other_stuff': other ...} ] 

currently am using:

grouped_distance = defaultdict(int)
for activity in activities:  
    act_type = activity['type']
    grouped_distance[act_type]  = activity['distance']

# {'Run': 12345, 'Ride': 12345, 'Swim': 12345} 

Have tried
grouped_distance = {activity['type']:[sum(activity['distance']) for activity in activities]}
this is not working where it says activity['type'] is not defined.

Edited
Fix some variables typo as noticed by @Samwise

Update: Did some a benchmark on all the solution that was posted. 10 millions items, with 10 different types:

Method 1 (Counter): 7.43s
Method 2 (itertools @chepner): 8.64s
Method 3 (groups @Dmig): 19.34s
Method 4 (pandas @d.b): 32.73s
Method 5 (Dict @d.b): 10.95s

Tested on Raspberry Pi 4 to further see the differences. Do correct me if I "name" the method wrongly.

Thank you everyone and @Dmig, @Mark, @juanpa.arrivillaga has piqued my interest in performance. Shorter/Neater ≠ Higher Performance. Wanted to just asked if I write it in a one liner form for it to look neater, but I have learnt a lot more than that.

CodePudding user response:

Your solution is good as it is, but if you really want one-liner:

act = [{'type': 'run', 'distance': 4}, {'type': 'run', 'distance': 3}, {'type': 'swim', 'distance': 5}]

groups = {
  t: sum(i['distance'] for i in act if i['type'] == t)
  for t in {i['type'] for i in act}  # set with all possible activities
}

print(groups)  # {'run': 7, 'swim': 5}

UPD: I've made some performance research, comparing this answer to answer which uses group(sortedby(...)). Turns out, on ten million entries and 10 different types, this approach loses to group(sortedby(...)) with 18.14 seconds against 10.12. So, while it is more readable, it is less efficient on bigger lists and especially with more distinct types in it (because it iterates initial list one time per each distinct type).

But take note, the initial straight way to do it from question would take only 5 seconds!

This answer is only to show one-liner for educational purposes, solution from question has much better performance. You should not use this instead of one in question, unless, as I said, you really want/need one-liner.

CodePudding user response:

Use itertools.groupby.

from operator import itemgetter


by_type = itemgetter('type')
distance = itemgetter('distance')
result = {
    k: sum(map(distance, v))
    for k, v in groupby(sorted(activities, key=by_type), by_type)
    }

When iterating over the groupby instance, k will be one of the activity types, and v will be an iterable of activities having type k.

CodePudding user response:

d = {}
for x in activities: d.__setitem__(x["type"], d.get(x["type"], 0)   x["distance"])
d
# {'Run': 12345, 'Ride': 12345, 'Swim': 12345}
  • Related