I want to compare each element in a list of lists with a dataframe column. For example,
groups_rids=[[AX1,AX2],[AX6,AX5,AX17]]
df = pd.DataFrame({'rid': [AX1,AX2,AX6,AX5,AX17],
'pid': [P2,P0,P3,P9,P13],
})
Here group_rids
is the list of lists. It has to be compared with rid
in df
.
Dataset:
|rid|pid|
|:---- |:------:|
|AX1|P2|
|AX2|P0|
|AX6|P3|
|AX5|P9|
|AX17|P13|
My result should be: |groups_rids|pid| |:---- |:------:| |[AX1,AX2]|[P2,P0]| |[AX6,AX5,AX17]|[P3,P9,P13]|
For each rid of a list in groups_rids
, I want to search df
for it and if present, append the corresponding pid
The dataset is large. So 3 nested for
loops take forever to print result. Is there a way to get the desired result without 3 nested for
loops if possible?
CodePudding user response:
Build a dict:
d = df.set_index('rid').to_dict()['pid']
And use it to build the Dataframe:
pd.DataFrame(((x, [d[el] for el in x]) for x in groups_rids), columns=['groups_rid', 'pid'])
groups_rid pid
0 [AX1, AX2] [P2, P0]
1 [AX6, AX5, AX17] [P3, P9, P13]