Suppose I have a dataframe
df =
| CHROM |
| ------|
| chr1 |
| chr5 |
| chr12 |
| chr9 |
| chr3 |
I have used this code to sort them:
sorted_df = df.sort_values(by=["CHROM"])
I got the result,
| CHROM |
| ------|
| chr1 |
| chr12 |
| chr3 |
| chr5 |
| chr9 |
But, My expected output
| CHROM |
| ------|
| chr1 |
| chr3 |
| chr5 |
| chr9 |
| chr12 |
Please suggest how to do using python
CodePudding user response:
You can split base "chr"
then sort the base number as int
and find the index and return df with the sorted index.
idx = df["CHROM"].str.split('chr').str[-1].astype(int).sort_values().index
df_new = df.iloc[idx]
print(df_new)
Or You can use pandas.DataFrame.sort_values
with key
.
df.sort_values(by=['CHROM'],
key=lambda x: df['CHROM'].str.split('chr').str[-1].astype(int))
Output:
CHROM
0 chr1
4 chr3
1 chr5
3 chr9
2 chr12