Home > front end >  Merging 2 pandas DataFrame based on the content of a column
Merging 2 pandas DataFrame based on the content of a column

Time:09-21

I have 2 pandas DataFrames (df1, and df2) like this.

import pandas as pd

data1 = [['tom', '10'], ['nick', '15'], ['juli', '14']]
df1 = pd.DataFrame(data1, columns=['name', 'id'])

data2 = [['tom', '59'], ['jane', '20'], ['leo', '17']]
df2 = pd.DataFrame(data2, columns=['name', 'id'])

# df1
#    name  id
# 0   tom  10
# 1  nick  15
# 2  juli  14

# df2
#    name  id
# 0   tom  59
# 1  jane  20
# 2   leo  17

How can I merge them into the following DataFrame?

data_merged = [['tom', '10', 'tom', '59'], ['nick', '15', '', ''], ['juli', '14', '', ''],
               ['', '', 'jane', '20'], ['', '', 'leo', '17']]
df_merged = pd.DataFrame(data_merged, columns=['name_1', 'id_1', 'name_2', 'id_2'])

# df_merged
#   name_1 id_1 name_2 id_2
# 0    tom   10    tom   59
# 1   nick   15            
# 2   juli   14            
# 3               jane   20
# 4                leo   17

The rule of merge is as follows:

If the content of the name column in df1 and df2 are identical, they would appear at the same row in df_merged.

Otherwise, place the data in different rows of df_merged.

CodePudding user response:

Use DataFrame.add_suffix for both DataFrames with outer join in DataFrame.merge:

df_merged = df1.add_suffix('_1').merge(df2.add_suffix('_2'), 
                                       left_on='name_1', 
                                       right_on='name_2', 
                                       how='outer').fillna('')
print (df_merged)
  name_1 id_1 name_2 id_2
0    tom   10    tom   59
1   nick   15            
2   juli   14            
3               jane   20
4                leo   17

If omit add_suffix output is different:

df_merged = df1.merge(df2, on='name', how='outer', suffixes=('_1','_2')).fillna('')
print (df_merged)
   name id_1 id_2
0   tom   10   59
1  nick   15     
2  juli   14     
3  jane        20
4   leo        17

So used rename for 2 columns for names:

df_merged=df1.rename(columns={'name':'name_1'}).merge(df2.rename(columns={'name':'name_2'}),
                                                      left_on='name_1', 
                                                      right_on='name_2', 
                                                      how='outer',
                                                      suffixes=('_1','_2')).fillna('')
print (df_merged)
  name_1 id_1 name_2 id_2
0    tom   10    tom   59
1   nick   15            
2   juli   14            
3               jane   20
4                leo   17
  • Related