Home > front end >  Recognise identical columns and melt them into one
Recognise identical columns and melt them into one

Time:05-31

I have a problem. I am merging two dataframes df1 and df2 together. I have a column that is called name in both. The merge does everything correctly and adds a _ with an x or y to each name. Unfortunately, the columns say the same thing. Is there an option to compare the name_x and name_y columns and if the values in each row are identical, merge the column into name?

It should be noted that I do not only have the column name but several different and identical ones, so it should be like a loop. For example, my dataset has more than 100 columns, so it is difficult to determine manually which columns have the same name and which do not.

# df1
   customerId   name
0           1  Anton
1           2  Marie
2           3    Max
3           4   Fran
4           5  Josie

#df2
   customerIddd   name     name2
0             1  Anton    Antond
1             2  Marie    Maride
2             3    Max      Ma2x
3             4   Fran    Frdsan
4             5  Josie  Joasdsie

Code

import pandas as pd

d = {'customerId': [1, 2, 3, 4, 5],
     'name': ['Anton', 'Marie', 'Max', 'Fran', 'Josie']
    }
df = pd.DataFrame(data=d)

d2 = {'customerIddd': [1, 2, 3, 4, 5],
      'name': ['Anton', 'Marie', 'Max', 'Fran', 'Josie'],
     'name2': ['Antond', 'Maride', 'Ma2x', 'Frdsan', 'Joasdsie']
    }
df2 = pd.DataFrame(data=d2)
print(df2)

df_merged = pd.merge(df,
            df2, how='inner',
            left_on=['customerId'], right_on=['customerIddd'])
print(df_merged)

Out

   customerId name_x  customerIddd name_y     name2
0           1  Anton             1  Anton    Antond
1           2  Marie             2  Marie    Maride
2           3    Max             3    Max      Ma2x
3           4   Fran             4   Fran    Frdsan
4           5  Josie             5  Josie  Joasdsie

What I want

   customerId name    customerIddd    name2
0           1  Anton             1   Antond
1           2  Marie             2   Maride
2           3    Max             3     Ma2x
3           4   Fran             4   Frdsan
4           5  Josie             5  Joasdsie

CodePudding user response:

You can merge with suffixes=['', '_right'] and then drop all columns whose name ends with _right:

df_merged = pd.merge(df,
            df2, how='inner',
            left_on=['customerId'], right_on=['customerIddd'],
            suffixes=['', '_right'])
df_merged.drop([col for col in df_merged.columns if col.endswith('_right')], axis=1)

Output (for your sample data)

   customerId   name  customerIddd     name2
0           1  Anton             1    Antond
1           2  Marie             2    Maride
2           3    Max             3      Ma2x
3           4   Fran             4    Frdsan
4           5  Josie             5  Joasdsie

CodePudding user response:

if your name on both table is exact the same you can use it in join to

df_merged = pd.merge(df,
        df2, how='inner',
        left_on=['customerId','name'], right_on=['customerIddd','name'])
print(df_merged)

the output will be as you wanned

   customerId   name  customerIddd     name2
0           1  Anton             1    Antond
1           2  Marie             2    Maride
2           3    Max             3      Ma2x
3           4   Fran             4    Frdsan
4           5  Josie             5  Joasdsie
  • Related