Home > Enterprise >  Merging two dataframes when none of the columns are an exact match
Merging two dataframes when none of the columns are an exact match

Time:07-01

I have two panda large dataframes that I would like to merge using the second column. The issue is that the first dataframe is longer than the second dataframe, so there are some names that won't be found in the second column. I would like to merge the two dataframes such that the rows that can't be merged just stay as they are and the ones that can be, the column 1 values are listed in a comma separated list. I have two short examples of the type of dataframe I am working with as well as the desired result.

DF 1

col 1   col 2
1       Johnny   
7       Cathy
4       Becky 
2       Sarah 
33      Courtney
83      Avery
78      Adam

DF 2

col 1   col 2
12      Johnny   
3       Cathy
13      Becky 
15      Sarah 
55      Adam

Desired result:

col 1      col 2
1, 12      Johnny   
7, 3       Cathy
4, 13      Becky 
2, 15      Sarah 
33         Courtney
83         Avery
78, 55     Adam

CodePudding user response:

If you don't need that specific row order in the output dataframe, you can just do this:

import pandas as pd

df1 = pd.DataFrame(
    {'col1': [1,7,4,2,33,83,78],
    'col2': ['Johnny', 'Cathy', 'Becky', 'Sarah', 'Courtney', 'Avery', 'Adam']}
)
df2 = pd.DataFrame(
    {'col1': [12,3,13,15,55],
    'col2': ['Johnny', 'Cathy', 'Becky', 'Sarah', 'Adam']}
)

df = pd.concat([df1, df2], axis=0).groupby('col2').agg(list).reset_index()
print(df)

#   col2    col1
# 0 Adam    [78, 55]
# 1 Avery   [83]
# 2 Becky   [4, 13]
# 3 Cathy   [7, 3]
# 4 Courtney    [33]
# 5 Johnny  [1, 12]
# 6 Sarah   [2, 15]
  • Related