Home > Software design >  pandas merge two dataframe and sort by compared column in adjacent column
pandas merge two dataframe and sort by compared column in adjacent column

Time:11-27

I compare two dataframe and result can be shown below;

import pandas as pd

exam_1 = {
  'Name': ['Jonn', 'Tomas', 'Fran', 'Olga', 'Veronika', 'Stephan'],
  'Mat': [85, 75, 50, 93, 88, 90],
  'Science': [96, 97, 99, 87, 90, 88],
  'Reading': [80, 60, 72, 86, 84, 77],
  'Wiritng': [78, 82, 88, 78, 86, 82],
  'Lang': [77, 79, 77, 72, 90, 92],
}

exam_2 = {
  'Name': ['Jonn', 'Tomas', 'Fran', 'Olga', 'Veronika', 'Stephan'],
  'Mat': [80, 80, 90, 90, 85, 80],
  'Science': [50, 60, 85, 90, 66, 82],
  'Reading': [60, 75, 55, 90, 85, 60],
  'Wiritng': [56, 66, 90, 82, 60, 80],
  'Lang': [80, 78, 76, 90, 77, 66],
}

df_1 = pd.DataFrame(exam_1)
df_2 = pd.DataFrame(exam_2)

cmp = pd.merge(df_1, df_2, how="outer", on=["Name"], suffixes=("_1", "_2"))
print(cmp)
       Name  Mat_1  Science_1  Reading_1  Wiritng_1  Lang_1  Mat_2  Science_2  Reading_2  Wiritng_2  Lang_2
0      Jonn     85         96         80         78      77     80         50         60         56      80
1     Tomas     75         97         60         82      79     80         60         75         66      78
2      Fran     50         99         72         88      77     90         85         55         90      76
3      Olga     93         87         86         78      72     90         90         90         82      90
4  Veronika     88         90         84         86      90     85         66         85         60      77
5   Stephan     90         88         77         82      92     80         82         60         80      66

But I want to see Mat_1 and Mat_2 in adjacent column and also others.

I try to do it manually but is there any easy way to do it like already built-in function.

CodePudding user response:

You can use pandas.DataFrame.sort_index on axis=1.

Replace this :

cmp = pd.merge(df_1, df_2, how="outer", on=["Name"], suffixes=("_1", "_2"))

By this :

cmp = (
        pd.merge(df_1, df_2, how="outer", on=["Name"], suffixes=("_1", "_2"))
            .set_index("Name")
            .sort_index(axis=1)
            .reset_index()
      )
​

# Output :

print(cmp.to_string()) 
​
       Name  Lang_1  Lang_2  Mat_1  Mat_2  Reading_1  Reading_2  Science_1  Science_2  Wiritng_1  Wiritng_2
0      Jonn      77      80     85     80         80         60         96         50         78         56
1     Tomas      79      78     75     80         60         75         97         60         82         66
2      Fran      77      76     50     90         72         55         99         85         88         90
3      Olga      72      90     93     90         86         90         87         90         78         82
4  Veronika      90      77     88     85         84         85         90         66         86         60
5   Stephan      92      66     90     80         77         60         88         82         82         80
  • Related