Home > Enterprise >  How to sort Pandas crosstab columns by sum of values
How to sort Pandas crosstab columns by sum of values

Time:11-01

I have a crosstab table with 4 rows and multiple columns, containing numeral values (representing the number of dataset elements on the crossing of two factors). I want to sort the order of columns in the crosstab by the sum of values in each column. e.g. I have:

ct = pd.crosstab(df_flt_reg['experience'], df_flt_reg['region'])

     | a|  b| c|  d|  e|
0    | 1|  0| 7|  3|  6|
1    | 2|  4| 1|  5|  4|
2    | 3|  5| 0|  7|  2|
3    | 1|  3| 1|  9|  1|
(sum)| 7| 12| 9| 24| 13| # row doesn't exist, written here to make clear the logic

What do I want:

     |  d|  e|  b| c| a|
0    |  3|  6|  0| 7| 1|
1    |  5|  4|  4| 1| 2|
2    |  7|  2|  5| 0| 3|
3    |  9|  1|  3| 1| 1|
(sum)| 24| 13| 12| 9| 7|

I succeded only in sorting the columns by their names (in alphabet order), but that's not what I need. I tried to sum those values separately, made a list of properly ordered indexes and then addressed them to crosstab.sort_values() via "by" parameter, but it was sorted in alphabet order again. Also I tried to create a new row "sum", but succeded to create only a new column -_- So I humbly asking for the community's help.

CodePudding user response:

Calculate the sum and sort the values. Once you have the sorted series get the index and reorder your columns with it.

sorted_df = ct[ct.sum().sort_values(ascending=False).index]

   d  e  b  c  a
0  3  6  0  7  1
1  5  4  4  1  2
2  7  2  5  0  3
3  9  1  3  1  1
  • Related