I have two panda large dataframes that I would like to merge using the second column. The issue is that the first dataframe is longer than the second dataframe, so there are some names that won't be found in the second column. I would like to merge the two dataframes such that the rows that can't be merged just stay as they are and the ones that can be, the column 1 values are listed in a comma separated list. I have two short examples of the type of dataframe I am working with as well as the desired result.
DF 1
col 1 col 2
1 Johnny
7 Cathy
4 Becky
2 Sarah
33 Courtney
83 Avery
78 Adam
DF 2
col 1 col 2
12 Johnny
3 Cathy
13 Becky
15 Sarah
55 Adam
Desired result:
col 1 col 2
1, 12 Johnny
7, 3 Cathy
4, 13 Becky
2, 15 Sarah
33 Courtney
83 Avery
78, 55 Adam
CodePudding user response:
If you don't need that specific row order in the output dataframe, you can just do this:
import pandas as pd
df1 = pd.DataFrame(
{'col1': [1,7,4,2,33,83,78],
'col2': ['Johnny', 'Cathy', 'Becky', 'Sarah', 'Courtney', 'Avery', 'Adam']}
)
df2 = pd.DataFrame(
{'col1': [12,3,13,15,55],
'col2': ['Johnny', 'Cathy', 'Becky', 'Sarah', 'Adam']}
)
df = pd.concat([df1, df2], axis=0).groupby('col2').agg(list).reset_index()
print(df)
# col2 col1
# 0 Adam [78, 55]
# 1 Avery [83]
# 2 Becky [4, 13]
# 3 Cathy [7, 3]
# 4 Courtney [33]
# 5 Johnny [1, 12]
# 6 Sarah [2, 15]