Home > Mobile >  Pandas: Merge contents of a dataframe into a single column (as a list of dict / json)
Pandas: Merge contents of a dataframe into a single column (as a list of dict / json)

Time:09-27

I want to introduce the contents of one df to another but as a list based on ID. I know to merge based on ID but I do not want duplicate rows for ID in the new dataframe. How do I get this done?

data1 = {'ID': ['AB01','AB02'], 
    'Name': ["toyota", "honda"],
    'Age':[21,22]
   }
df1 = pd.DataFrame.from_dict(data1)
data2 = {'ID': ['AB01','AB01','AB03','AB03'], 
    'Type': ["C",np.nan,"X","S"],
    'Score':[87,98,45,82]
   }
df2 = pd.DataFrame.from_dict(data2)

Input dataframes

The result should look like this

enter image description here

CodePudding user response:

Try merge:

print(df1.merge(df2, on='ID', how='left').groupby(['ID', 'Name', 'Age']).apply(lambda x: a.to_dict('records') if (a:=x[['ID']].join(x.iloc[:, 3:])).dropna().any().any() else []).reset_index(name='Info'))

Output:

     ID    Name  Age                                               Info
0  AB01  toyota   21  [{'ID': 'AB01', 'Type': 'C', 'Score': 87.0}, {...
1  AB02   honda   22                                                 []

CodePudding user response:

You can make dict on the rows of df2 by .apply(), then group by ID and aggregate the dict of same ID into list by .groupby() .agg().

Then, merge with df1 with .merge() by left join with ID as matching keys, as follows:

df2_info = (df2.apply(dict, axis=1)
               .groupby(df2['ID'])
               .agg(list)
               .reset_index(name='Info')
           )

df_out = df1.merge(df2_info, on='ID', how='left')

Result

print(df_out)

     ID    Name  Age                                                                                  Info
0  AB01  toyota   21  [{'ID': 'AB01', 'Type': 'C', 'Score': 87}, {'ID': 'AB01', 'Type': nan, 'Score': 98}]
1  AB02   honda   22                                                                                   NaN

For reference only, interim result of df2_info:

     ID                                                                                  Info
0  AB01  [{'ID': 'AB01', 'Type': 'C', 'Score': 87}, {'ID': 'AB01', 'Type': nan, 'Score': 98}]
1  AB03  [{'ID': 'AB03', 'Type': 'X', 'Score': 45}, {'ID': 'AB03', 'Type': 'S', 'Score': 82}]
  • Related