I have a crosstab table with 4 rows and multiple columns, containing numeral values (representing the number of dataset elements on the crossing of two factors). I want to sort the order of columns in the crosstab by the sum of values in each column. e.g. I have:
ct = pd.crosstab(df_flt_reg['experience'], df_flt_reg['region'])
| a| b| c| d| e|
0 | 1| 0| 7| 3| 6|
1 | 2| 4| 1| 5| 4|
2 | 3| 5| 0| 7| 2|
3 | 1| 3| 1| 9| 1|
(sum)| 7| 12| 9| 24| 13| # row doesn't exist, written here to make clear the logic
What do I want:
| d| e| b| c| a|
0 | 3| 6| 0| 7| 1|
1 | 5| 4| 4| 1| 2|
2 | 7| 2| 5| 0| 3|
3 | 9| 1| 3| 1| 1|
(sum)| 24| 13| 12| 9| 7|
I succeded only in sorting the columns by their names (in alphabet order), but that's not what I need. I tried to sum those values separately, made a list of properly ordered indexes and then addressed them to crosstab.sort_values() via "by" parameter, but it was sorted in alphabet order again. Also I tried to create a new row "sum", but succeded to create only a new column -_- So I humbly asking for the community's help.
CodePudding user response:
Calculate the sum and sort the values. Once you have the sorted series get the index and reorder your columns with it.
sorted_df = ct[ct.sum().sort_values(ascending=False).index]
d e b c a
0 3 6 0 7 1
1 5 4 4 1 2
2 7 2 5 0 3
3 9 1 3 1 1