I have a problem. I am merging two dataframes df1
and df2
together. I have a column that is called name
in both. The merge does everything correctly and adds a _
with an x
or y
to each name
. Unfortunately, the columns say the same thing.
Is there an option to compare the name_x
and name_y
columns and if the values in each row are identical, merge the column into name
?
It should be noted that I do not only have the column name
but several different and identical ones, so it should be like a loop. For example, my dataset has more than 100 columns, so it is difficult to determine manually which columns have the same name and which do not.
# df1
customerId name
0 1 Anton
1 2 Marie
2 3 Max
3 4 Fran
4 5 Josie
#df2
customerIddd name name2
0 1 Anton Antond
1 2 Marie Maride
2 3 Max Ma2x
3 4 Fran Frdsan
4 5 Josie Joasdsie
Code
import pandas as pd
d = {'customerId': [1, 2, 3, 4, 5],
'name': ['Anton', 'Marie', 'Max', 'Fran', 'Josie']
}
df = pd.DataFrame(data=d)
d2 = {'customerIddd': [1, 2, 3, 4, 5],
'name': ['Anton', 'Marie', 'Max', 'Fran', 'Josie'],
'name2': ['Antond', 'Maride', 'Ma2x', 'Frdsan', 'Joasdsie']
}
df2 = pd.DataFrame(data=d2)
print(df2)
df_merged = pd.merge(df,
df2, how='inner',
left_on=['customerId'], right_on=['customerIddd'])
print(df_merged)
Out
customerId name_x customerIddd name_y name2
0 1 Anton 1 Anton Antond
1 2 Marie 2 Marie Maride
2 3 Max 3 Max Ma2x
3 4 Fran 4 Fran Frdsan
4 5 Josie 5 Josie Joasdsie
What I want
customerId name customerIddd name2
0 1 Anton 1 Antond
1 2 Marie 2 Maride
2 3 Max 3 Ma2x
3 4 Fran 4 Frdsan
4 5 Josie 5 Joasdsie
CodePudding user response:
You can merge with suffixes=['', '_right']
and then drop all columns whose name ends with _right
:
df_merged = pd.merge(df,
df2, how='inner',
left_on=['customerId'], right_on=['customerIddd'],
suffixes=['', '_right'])
df_merged.drop([col for col in df_merged.columns if col.endswith('_right')], axis=1)
Output (for your sample data)
customerId name customerIddd name2
0 1 Anton 1 Antond
1 2 Marie 2 Maride
2 3 Max 3 Ma2x
3 4 Fran 4 Frdsan
4 5 Josie 5 Joasdsie
CodePudding user response:
if your name on both table is exact the same you can use it in join to
df_merged = pd.merge(df,
df2, how='inner',
left_on=['customerId','name'], right_on=['customerIddd','name'])
print(df_merged)
the output will be as you wanned
customerId name customerIddd name2
0 1 Anton 1 Antond
1 2 Marie 2 Maride
2 3 Max 3 Ma2x
3 4 Fran 4 Frdsan
4 5 Josie 5 Joasdsie