Home > Back-end >  How to sort pandas rows based on column values
How to sort pandas rows based on column values

Time:06-24

in this dataframe:

Feat1   Feat2   Feat3   Feat4   Labels
-46.220314  22.862856   -6.1573067  5.6060414   2
-23.80669   20.536781   -5.015675   4.2216353   2
-42.092365  25.680704   -5.0092897  5.665794    2
-35.29639   21.709473   -4.160352   5.578346    2
-37.075096  22.347767   -3.860426   5.6953945   2
-42.8849    28.03802    -7.8572545  3.3361  2
-32.3057    26.568039   -9.47018    3.4532788   2
-24.469942  27.005375   -9.301921   4.3995037   2
-97.89892   -0.38156664 6.4163384   7.234347    1
-81.96325   0.1821717   -1.2870358  4.703838    1
-78.41986   -6.766374   0.8001185   0.83444935  1
-100.68544  -4.5810957  1.6977689   1.8801615   1
-87.05412   -2.9231584  6.817379    5.4460077   1
-64.121056  -3.7892206  -0.283514   6.3084154   1
-94.504845  -0.9999217  3.2884297   6.881124    1
-61.951996  -8.960198   -1.5915259  5.6160254   1
-108.19452  13.909201   0.6966458   -1.956591   0
-97.4037    22.897585   -2.8488266  1.4105041   0
-92.641335  22.10624    -3.5110545  2.467166    0
-199.18787  3.3090565   -2.5994794  4.0802555   0
-137.5976   6.795896    1.6793671   2.2256763   0
-208.0035   -1.33229    -3.2078092  1.5177402   0
-108.225975 14.341716   1.02891 -1.8651972  0
-121.29299  18.274035   2.2891548   2.3360753   0

I wanted to sort the rows based on different column values in the "Labels" column.

I am able to sort in ascending such that the labels appear as [0 1 2] via the command

df2 = df1.sort_values(by = 'Labels', ascending = True)

Then ascending = False, where the labels appear [2 1 0].

How then do I go about sorting the labels as [1 0 2]?

Any help will be greatly appreciated!

CodePudding user response:

Here's a way using Categorical:

df['Labels'] = pd.Categorical(df['Labels'],
                              categories = [1, 0, 2],
                              ordered=True)


df.sort_values('Labels')

Output:

         Feat1      Feat2     Feat3     Feat4 Labels
11 -100.685440  -4.581096  1.697769  1.880162      1
15  -61.951996  -8.960198 -1.591526  5.616025      1
8   -97.898920  -0.381567  6.416338  7.234347      1
9   -81.963250   0.182172 -1.287036  4.703838      1
10  -78.419860  -6.766374  0.800118  0.834449      1
14  -94.504845  -0.999922  3.288430  6.881124      1
12  -87.054120  -2.923158  6.817379  5.446008      1
13  -64.121056  -3.789221 -0.283514  6.308415      1
21 -208.003500  -1.332290 -3.207809  1.517740      0
20 -137.597600   6.795896  1.679367  2.225676      0
19 -199.187870   3.309057 -2.599479  4.080255      0
18  -92.641335  22.106240 -3.511055  2.467166      0
17  -97.403700  22.897585 -2.848827  1.410504      0
16 -108.194520  13.909201  0.696646 -1.956591      0
23 -121.292990  18.274035  2.289155  2.336075      0
22 -108.225975  14.341716  1.028910 -1.865197      0
7   -24.469942  27.005375 -9.301921  4.399504      2
6   -32.305700  26.568039 -9.470180  3.453279      2
5   -42.884900  28.038020 -7.857254  3.336100      2
4   -37.075096  22.347767 -3.860426  5.695394      2
3   -35.296390  21.709473 -4.160352  5.578346      2
2   -42.092365  25.680704 -5.009290  5.665794      2
1   -23.806690  20.536781 -5.015675  4.221635      2
0   -46.220314  22.862856 -6.157307  5.606041      2

CodePudding user response:

You can use an ordered Categorical, or if you don't want to change the DataFrame, the poor-man's variant, a mapping Series:

order = [1, 0, 2]
key = pd.Series({k:v for v,k in enumerate(order)}).get
# or
# pd.Series(range(len(order)), index=order).get

df1.sort_values(by='Labels', key=key)

Example:

df1 = pd.DataFrame({'Labels': [1,0,1,2,0,2,1]})

order = [1, 0, 2]
key = pd.Series({k:v for v,k in enumerate(order)}).get
print(df1.sort_values(by='Labels', key=key))

   Labels
0       1
2       1
6       1
1       0
4       0
3       2
5       2

CodePudding user response:

here is another way to do it

create a new column using map and map the new order sequence and then sort as usual

df['sort_label'] = df['Labels'].map({1:0, 0:1, 2:2 }) #).sort_values('sort_label', ascending=False)
df.sort_values('sort_label')
Feat1   Feat2   Feat3   Feat4   Labels  sort_label
11  -100.685440     -4.581096   1.697769    1.880162    1   0
15  -61.951996  -8.960198   -1.591526   5.616025    1   0
8   -97.898920  -0.381567   6.416338    7.234347    1   0
9   -81.963250  0.182172    -1.287036   4.703838    1   0
10  -78.419860  -6.766374   0.800119    0.834449    1   0
14  -94.504845  -0.999922   3.288430    6.881124    1   0
12  -87.054120  -2.923158   6.817379    5.446008    1   0
13  -64.121056  -3.789221   -0.283514   6.308415    1   0
21  -208.003500     -1.332290   -3.207809   1.517740    0   1
20  -137.597600     6.795896    1.679367    2.225676    0   1
19  -199.187870     3.309057    -2.599479   4.080255    0   1
18  -92.641335  22.106240   -3.511054   2.467166    0   1
17  -97.403700  22.897585   -2.848827   1.410504    0   1
16  -108.194520     13.909201   0.696646    -1.956591   0   1
23  -121.292990     18.274035   2.289155    2.336075    0   1
22  -108.225975     14.341716   1.028910    -1.865197   0   1
7   -24.469942  27.005375   -9.301921   4.399504    2   2
6   -32.305700  26.568039   -9.470180   3.453279    2   2
5   -42.884900  28.038020   -7.857254   3.336100    2   2
4   -37.075096  22.347767   -3.860426   5.695394    2   2
3   -35.296390  21.709473   -4.160352   5.578346    2   2
2   -42.092365  25.680704   -5.009290   5.665794    2   2
1   -23.806690  20.536781   -5.015675   4.221635    2   2
0   -46.220314  22.862856   -6.157307   5.606041    2   2

  • Related