I have 2 pandas DataFrames (df1
, and df2
) like this.
import pandas as pd
data1 = [['tom', '10'], ['nick', '15'], ['juli', '14']]
df1 = pd.DataFrame(data1, columns=['name', 'id'])
data2 = [['tom', '59'], ['jane', '20'], ['leo', '17']]
df2 = pd.DataFrame(data2, columns=['name', 'id'])
# df1
# name id
# 0 tom 10
# 1 nick 15
# 2 juli 14
# df2
# name id
# 0 tom 59
# 1 jane 20
# 2 leo 17
How can I merge them into the following DataFrame?
data_merged = [['tom', '10', 'tom', '59'], ['nick', '15', '', ''], ['juli', '14', '', ''],
['', '', 'jane', '20'], ['', '', 'leo', '17']]
df_merged = pd.DataFrame(data_merged, columns=['name_1', 'id_1', 'name_2', 'id_2'])
# df_merged
# name_1 id_1 name_2 id_2
# 0 tom 10 tom 59
# 1 nick 15
# 2 juli 14
# 3 jane 20
# 4 leo 17
The rule of merge is as follows:
If the content of the name
column in df1
and df2
are identical, they would appear at the same row in df_merged
.
Otherwise, place the data in different rows of df_merged
.
CodePudding user response:
Use DataFrame.add_suffix
for both DataFrames with outer join in DataFrame.merge
:
df_merged = df1.add_suffix('_1').merge(df2.add_suffix('_2'),
left_on='name_1',
right_on='name_2',
how='outer').fillna('')
print (df_merged)
name_1 id_1 name_2 id_2
0 tom 10 tom 59
1 nick 15
2 juli 14
3 jane 20
4 leo 17
If omit add_suffix
output is different:
df_merged = df1.merge(df2, on='name', how='outer', suffixes=('_1','_2')).fillna('')
print (df_merged)
name id_1 id_2
0 tom 10 59
1 nick 15
2 juli 14
3 jane 20
4 leo 17
So used rename
for 2 columns for name
s:
df_merged=df1.rename(columns={'name':'name_1'}).merge(df2.rename(columns={'name':'name_2'}),
left_on='name_1',
right_on='name_2',
how='outer',
suffixes=('_1','_2')).fillna('')
print (df_merged)
name_1 id_1 name_2 id_2
0 tom 10 tom 59
1 nick 15
2 juli 14
3 jane 20
4 leo 17