Home > Blockchain >  Pandas. Cannot sort by multiple columns
Pandas. Cannot sort by multiple columns

Time:05-17

Edited for clarity:

I have a dataframe in the following format

i    col1         col2  col3
0    00:00:00,1   10    1.7
1    00:00:00,2   10    1.5
2    00:00:00,3   50    4.6
3    00:00:00,4   30    3.4
4    00:00:00,5   20    5.6
5    00:00:00,6   50    1.8
6    00:00:00,9   20    1.9

...

That I'm trying to sort like this

 i    col1         col2  col3
0    00:00:00,1   10    1.7
1    00:00:00,2   10    1.5
4    00:00:00,5   20    5.6
3    00:00:00,9   20    1.9
4    00:00:00,4   30    3.4
5    00:00:00,3   50    4.6
6    00:00:00,6   50    1.8

...

I've tried df = df.sort_values(by = ['col1', 'col2'] which only works on col1. I understand that it may have something to do with the values being 'strings', but I can't seem to find a workaround for it.

CodePudding user response:

If need sort each column independently use Series.sort_values in DataFrame.apply:

c = ['col1','col2']
df[c] = df[c].apply(lambda x: x.sort_values().to_numpy())
#alternative
df[c] = df[c].apply(lambda x: x.sort_values().tolist())
print (df)
   i        col1  col2
0  0  00:00:00,1    10
1  1  00:00:01,5    20
2  2  00:00:10,0    30
3  3  00:01:00,1    40
4  5  01:00:00,0    50

CodePudding user response:

df.sort_values(by = ['col2', 'col1']

Gave the desired result

  • Related