Sorting column by a number in a string-CodePudding

I have a data frame that looks something like this:

       ID Data
0  1_SS22   D1
1  7_SS22   D7
2  4_SS22   D4
3  3_SS22   D3
4  8_SS22   D8
5  6_SS22   D6
6  2_SS22   D2
7  5_SS22   D5

I want to sort the 'ID' column by the number associated with it so that my data frame looks like this:

       ID Data
0  1_SS22   D1
1  2_SS22   D2
2  3_SS22   D3
3  4_SS22   D4
4  5_SS22   D5
5  6_SS22   D6
6  7_SS22   D7
7  8_SS22   D8

Is there any way to do this? I've tried using df.sort_index(), however, that doesn't seem to be working. I think the problem is that the column contains a string and not an integer.

CodePudding user response：

You can use .str.split, .str[0], and .astype(int) to get the numeric value from the ID column. Then, sort it with sort_values, and take the index from that and use it to index the original dataframe. Finally, use reset_index() to straighten out the index:

df_sorted = df.loc[df['ID'].str.split('_').str[0].astype(int).sort_values().index].reset_index(drop=True)

Output:

>>> df_sorted
       ID Data
0  1_SS22   D1
1  2_SS22   D2
2  3_SS22   D3
3  4_SS22   D4
4  5_SS22   D5
5  6_SS22   D6
6  7_SS22   D7
7  8_SS22   D8

CodePudding user response：

df.sort_values("Data")

you can sort_values when string