Basic question - I am trying to concatenate two DataFrames on the same columns but not change the index order, For example:
df1 = pd.DataFrame({'kabat_number':['H1','H2','H2A','H3','H4','H20','H20A','H30','H31'], 'AA':['A','C','S','Y','R','C','Y','V','I']})
df2 = pd.DataFrame({'kabat_number':['H1','H2','H3','H4','H20A','H20B','H20C','H30','H31'],'AA':['A','C','Y','R','C','Y','L','G','V']})
dfs = pd.merge(df1,df2,on='kabat_number',how='outer')
print(dfs)
kabat_number AA_x AA_y
0 H1 A A
1 H2 C C
2 H2A S NaN
3 H3 Y Y
4 H4 R R
5 H20 C NaN
6 H20A Y C
7 H30 V G
8 H31 I V
9 H20B NaN Y
10 H20C NaN L
the merge result order changed, (H20B,H20C was put in the end).
but what i want to get is :
kabat_number AA_x AA_y
0 H1 A A
1 H2 C C
2 H2A S NaN
3 H3 Y Y
4 H4 R R
5 H20 C NaN
6 H20A Y C
7 H20B NaN Y
8 H20C NaN L
9 H30 V G
10 H31 I V
also i try sort=False , the order is changed still, how could i get the result what i want? thanks!
CodePudding user response:
Sort with natsort_key
after the merge:
# pip install natsort
from natsort import natsort_key
dfs = (pd.merge(df1,df2,on='kabat_number',how='outer')
.sort_values(by='kabat_number', key=natsort_key, ignore_index=True)
)
Output:
kabat_number AA_x AA_y
0 H1 A A
1 H2 C C
2 H2A S NaN
3 H3 Y Y
4 H4 R R
5 H20 C NaN
6 H20A Y C
7 H20B NaN Y
8 H20C NaN L
9 H30 V G
10 H31 I V
CodePudding user response:
try this:
import pandas as pd
from natsort import natsorted
import numpy as np
df1 = pd.DataFrame({'kabat_number':['H1','H2','H2A','H3','H4','H20','H20A','H30','H31'], 'AA':['A','C','S','Y','R','C','Y','V','I']})
df2 = pd.DataFrame({'kabat_number':['H1','H2','H3','H4','H20A','H20B','H20C','H30','H31'],'AA':['A','C','Y','R','C','Y','L','G','V']})
dfs = pd.merge(df1,df2,on='kabat_number',how='outer')
dfs = dfs.sort_values(
by='kabat_number',
key=lambda x: np.argsort(natsorted(x))
).reset_index(drop=True)
print(dfs)