Home > Mobile >  Concatenating two Pandas DataFrames and not change index order
Concatenating two Pandas DataFrames and not change index order

Time:01-14

Basic question - I am trying to concatenate two DataFrames on the same columns but not change the index order, For example:

df1 = pd.DataFrame({'kabat_number':['H1','H2','H2A','H3','H4','H20','H20A','H30','H31'], 'AA':['A','C','S','Y','R','C','Y','V','I']})
df2 = pd.DataFrame({'kabat_number':['H1','H2','H3','H4','H20A','H20B','H20C','H30','H31'],'AA':['A','C','Y','R','C','Y','L','G','V']})
dfs = pd.merge(df1,df2,on='kabat_number',how='outer')
print(dfs)

   kabat_number AA_x AA_y
0            H1    A    A
1            H2    C    C
2           H2A    S  NaN
3            H3    Y    Y
4            H4    R    R
5           H20    C  NaN
6          H20A    Y    C
7           H30    V    G
8           H31    I    V
9          H20B  NaN    Y
10         H20C  NaN    L

the merge result order changed, (H20B,H20C was put in the end).

but what i want to get is :

   kabat_number AA_x AA_y
0            H1    A    A
1            H2    C    C
2           H2A    S  NaN
3            H3    Y    Y
4            H4    R    R
5           H20    C  NaN
6          H20A    Y    C
7          H20B  NaN    Y
8          H20C  NaN    L
9           H30    V    G
10          H31    I    V

also i try sort=False , the order is changed still, how could i get the result what i want? thanks!

CodePudding user response:

Sort with natsort_key after the merge:

# pip install natsort
from natsort import natsort_key

dfs = (pd.merge(df1,df2,on='kabat_number',how='outer')
         .sort_values(by='kabat_number', key=natsort_key, ignore_index=True)
      )

Output:

   kabat_number AA_x AA_y
0            H1    A    A
1            H2    C    C
2           H2A    S  NaN
3            H3    Y    Y
4            H4    R    R
5           H20    C  NaN
6          H20A    Y    C
7          H20B  NaN    Y
8          H20C  NaN    L
9           H30    V    G
10          H31    I    V

CodePudding user response:

try this:

import pandas as pd
from natsort import natsorted
import numpy as np

df1 = pd.DataFrame({'kabat_number':['H1','H2','H2A','H3','H4','H20','H20A','H30','H31'], 'AA':['A','C','S','Y','R','C','Y','V','I']})
df2 = pd.DataFrame({'kabat_number':['H1','H2','H3','H4','H20A','H20B','H20C','H30','H31'],'AA':['A','C','Y','R','C','Y','L','G','V']})
dfs = pd.merge(df1,df2,on='kabat_number',how='outer')
dfs = dfs.sort_values(
    by='kabat_number', 
    key=lambda x: np.argsort(natsorted(x))
    ).reset_index(drop=True)
print(dfs)
  • Related