I have a data frame that looks something like this:
ID Data
0 1_SS22 D1
1 7_SS22 D7
2 4_SS22 D4
3 3_SS22 D3
4 8_SS22 D8
5 6_SS22 D6
6 2_SS22 D2
7 5_SS22 D5
I want to sort the 'ID' column by the number associated with it so that my data frame looks like this:
ID Data
0 1_SS22 D1
1 2_SS22 D2
2 3_SS22 D3
3 4_SS22 D4
4 5_SS22 D5
5 6_SS22 D6
6 7_SS22 D7
7 8_SS22 D8
Is there any way to do this? I've tried using df.sort_index()
, however, that doesn't seem to be working. I think the problem is that the column contains a string and not an integer.
CodePudding user response:
You can use .str.split
, .str[0]
, and .astype(int)
to get the numeric value from the ID
column. Then, sort it with sort_values
, and take the index from that and use it to index the original dataframe. Finally, use reset_index()
to straighten out the index:
df_sorted = df.loc[df['ID'].str.split('_').str[0].astype(int).sort_values().index].reset_index(drop=True)
Output:
>>> df_sorted
ID Data
0 1_SS22 D1
1 2_SS22 D2
2 3_SS22 D3
3 4_SS22 D4
4 5_SS22 D5
5 6_SS22 D6
6 7_SS22 D7
7 8_SS22 D8
CodePudding user response:
df.sort_values("Data")
you can sort_values when string