I have 2 large pandas dfs, where the first one contains various IDs a specific student. The other one has one or more of the IDs that match. I want to merge the dfs such that it will combine based on if df2 has a value found in df1.
DF1:
ID name
nfi23, wjm348 sally
hji21, arb128 joe
mbi13, ybm328 mary
DF2
ID grade
nfi23 twelfth
arb128, hji21 third
mbi13 fourth
desired:
ID name grade
nfi23, wjm348 sally twelfth
hji21, arb128 joe third
mbi13, ybm328 mary fourth
CodePudding user response:
# Break the ids apart.
df1.ID = df1.ID.str.split(', ')
df2.ID = df2.ID.str.split(', ')
df1 = df1.explode('ID')
df2 = df2.explode('ID')
# Merge on ID
out = (df1.merge(df2, 'left')
.groupby('name') # Bring it all back together~
.agg(set)
.applymap(lambda x: ', '.join(i for i in x if isinstance(i, str)))
.reset_index())
print(out)
Output:
name ID grade
0 joe hji21, arb128 third
1 mary ybm328, mbi13 fourth
2 sally nfi23, wjm348 twelfth
CodePudding user response:
Try with explode then we can do map
df2['ID'] = df2['ID'].str.split(',')
df2 = df2.explode('ID')
df1['grade'] = df1.ID.str.split(',').explode().map(df2.set_index('ID')['grade']).groupby(level=0).first()
df1
Out[774]:
ID name grade
0 nfi23,wjm348 sally twelfth
1 hji21,arb128 joe third
2 mbi13,ybm328 mary fourth
CodePudding user response:
df = pd.DataFrame(columns=['ID', 'name', 'grade'])
for i, (ids, name) in df1.iterrows():
for j, ids2 in df2.ID.iteritems():
if len(set(ids) & set(ids2)) > 0:
break
df.loc[i] = ids, name, df2.loc[j, 'grade']
print(df)
prints
index | ID | name | grade |
---|---|---|---|
0 | nfi23,wjm348 | sally | twelfth |
1 | hji21,arb128 | joe | third |
2 | mbi13,ybm328 | mary | fourth |
CodePudding user response:
You may use pd.Series.map
after making a series out of split list in df1's ID and then take the first value.
df2['ID'] = df2['ID'].str.split(',')
df2 = df2.explode('ID')
df2['ID'] = df2['ID'].str.strip()
d = df2.set_index('ID')['grade']
df1['grade'] = df1['ID'].apply(lambda x: pd.Series(x.split(',')).map(d)[0])
Output:
ID name grade
0 nfi123, wjm348 sally twelfth
1 hji21, arb128 joe third
2 mbi13, ybm328 mary fourth