Home > OS >  sorting DataFrame by columns using python
sorting DataFrame by columns using python

Time:12-16

Suppose I have a dataframe

df =
| CHROM | 
| ------| 
| chr1  | 
| chr5  |
| chr12 |
| chr9  |
| chr3  |

I have used this code to sort them:

sorted_df = df.sort_values(by=["CHROM"])

I got the result,

| CHROM | 
| ------| 
| chr1  | 
| chr12 |
| chr3  |
| chr5  |
| chr9  |

But, My expected output

| CHROM | 
| ------| 
| chr1  | 
| chr3  |
| chr5  |
| chr9  |
| chr12 |

Please suggest how to do using python

CodePudding user response:

You can split base "chr" then sort the base number as int and find the index and return df with the sorted index.

idx = df["CHROM"].str.split('chr').str[-1].astype(int).sort_values().index
df_new = df.iloc[idx]
print(df_new)

Or You can use pandas.DataFrame.sort_values with key.

df.sort_values(by=['CHROM'], 
               key=lambda x: df['CHROM'].str.split('chr').str[-1].astype(int))

Output:

   CHROM
0   chr1
4   chr3
1   chr5
3   chr9
2  chr12
  • Related