I essentially have this dataset, and I'd like to add/combine the points of each item where the userID is a duplicate.
I know its somewhere along the lines of for (a,b) in array
but asking for assistance learning.
[{'userID': 'QzzucRibfSahbUGwr2PGuhFSU242', 'points': 254}, {'userID': '5lyU0TCyqRcTD3y7Rs2FGV8h2Sd2', 'points': 268}, {'userID': 'QzzucRibfSahbUGwr2PGuhFSU242', 'points': 278}, {'userID': 'QzzucRibfSahbUGwr2PGuhFSU242', 'points': 254}]
Result should be
[{'userID': 'QzzucRibfSahbUGwr2PGuhFSU242', 'points': 786},
{'userID': '5lyU0TCyqRcTD3y7Rs2FGV8h2Sd2', 'points': 268}
]
Appreciate you guys and SO.
CodePudding user response:
I would summarize into a single dictionary
d = [{'userID': 'QzzucRibfSahbUGwr2PGuhFSU242', 'points': 254},
{'userID': '5lyU0TCyqRcTD3y7Rs2FGV8h2Sd2', 'points': 268},
{'userID': 'QzzucRibfSahbUGwr2PGuhFSU242', 'points': 278},
{'userID': 'QzzucRibfSahbUGwr2PGuhFSU242', 'points': 254}]
ans = dict.fromkeys((x["userID"] for x in d), 0)
for el in d:
k, v = el["userID"], el["points"]
ans[k] = v
print(ans)
# {'QzzucRibfSahbUGwr2PGuhFSU242': 786, '5lyU0TCyqRcTD3y7Rs2FGV8h2Sd2': 268}
If you want a list at the end, you can do
[{"userID": k, "points": v} for k, v in ans.items()]
CodePudding user response:
Assuming l
the input list, you can use a dictionary to group by ID while aggregating, then convert back to list:
out = {}
for d in l:
if d['userID'] not in out:
out[d['userID']] = d.copy() # to not modify original
else:
out[d['userID']]['points'] = d['points']
out = list(out.values())
Alternative with setdefault
and a dictionary template:
out = {}
for d in l:
out.setdefault(d['userID'],
{'userID': d['userID'],
'points': 0,
})['points'] = d['points']
out = list(out.values())
Output:
[{'userID': 'QzzucRibfSahbUGwr2PGuhFSU242', 'points': 786},
{'userID': '5lyU0TCyqRcTD3y7Rs2FGV8h2Sd2', 'points': 268}]
CodePudding user response:
Pandas is useful for doing this kind of computation efficiently:
import pandas as pd
points_by_id = pd.DataFrame(data).groupby('userID').sum()
points
userID
5lyU0TCyqRcTD3y7Rs2FGV8h2Sd2 268
QzzucRibfSahbUGwr2PGuhFSU242 786