Merge another field with duplicates IDs python-CodePudding

I essentially have this dataset, and I'd like to add/combine the points of each item where the userID is a duplicate.

I know its somewhere along the lines of for (a,b) in array but asking for assistance learning.

[{'userID': 'QzzucRibfSahbUGwr2PGuhFSU242', 'points': 254}, {'userID': '5lyU0TCyqRcTD3y7Rs2FGV8h2Sd2', 'points': 268}, {'userID': 'QzzucRibfSahbUGwr2PGuhFSU242', 'points': 278}, {'userID': 'QzzucRibfSahbUGwr2PGuhFSU242', 'points': 254}]

Result should be

[{'userID': 'QzzucRibfSahbUGwr2PGuhFSU242', 'points': 786},
 {'userID': '5lyU0TCyqRcTD3y7Rs2FGV8h2Sd2', 'points': 268}
]

Appreciate you guys and SO.

CodePudding user response：

I would summarize into a single dictionary

d = [{'userID': 'QzzucRibfSahbUGwr2PGuhFSU242', 'points': 254},
     {'userID': '5lyU0TCyqRcTD3y7Rs2FGV8h2Sd2', 'points': 268},
     {'userID': 'QzzucRibfSahbUGwr2PGuhFSU242', 'points': 278},
     {'userID': 'QzzucRibfSahbUGwr2PGuhFSU242', 'points': 254}]

ans = dict.fromkeys((x["userID"] for x in d), 0)

for el in d:
    k, v = el["userID"], el["points"]
    ans[k]  = v
    
print(ans)
# {'QzzucRibfSahbUGwr2PGuhFSU242': 786, '5lyU0TCyqRcTD3y7Rs2FGV8h2Sd2': 268}

If you want a list at the end, you can do

[{"userID": k, "points": v} for k, v in ans.items()]

CodePudding user response：

Assuming l the input list, you can use a dictionary to group by ID while aggregating, then convert back to list:

out = {}

for d in l:
    if d['userID'] not in out:
        out[d['userID']] = d.copy() # to not modify original
    else:
        out[d['userID']]['points']  = d['points']

out = list(out.values())

Alternative with setdefault and a dictionary template:

out = {}

for d in l:
    out.setdefault(d['userID'],
                   {'userID': d['userID'],
                    'points': 0,
                    })['points']  = d['points']

out = list(out.values())

Output:

[{'userID': 'QzzucRibfSahbUGwr2PGuhFSU242', 'points': 786},
 {'userID': '5lyU0TCyqRcTD3y7Rs2FGV8h2Sd2', 'points': 268}]

CodePudding user response：

Pandas is useful for doing this kind of computation efficiently:

import pandas as pd

points_by_id = pd.DataFrame(data).groupby('userID').sum()

                              points
userID                              
5lyU0TCyqRcTD3y7Rs2FGV8h2Sd2     268
QzzucRibfSahbUGwr2PGuhFSU242     786