Ordering a stacked histplot based on total counts-CodePudding

I have a dataframe which results from:

df_grouped = df.groupby(['A', 'B'])['A'].count().sort_values(ascending=False)
df_grouped = pd.DataFrame(df_grouped)
new_index = pd.MultiIndex.from_tuples(df_grouped.index)
df_grouped.index = new_index
df_grouped.reset_index(inplace=True)
df_grouped.columns = ['A', 'B', 'count']

Then, df_grouped is something like:

A	B	count
A_1	B_1	10
A_1	B_2	51
A_1	B_3	25
A_1	B_4	12
A_1	B_5	2
A_2	B_1	19
A_2	B_3	5
A_3	B_5	18
A_3	B_4	33
A_3	B_5	44
A_4	B_1	29
A_5	B_2	32

I have plotted a seaborn.histplot using the following code:

fig, ax = plt.subplots(1, 1, figsize=(10,5))
sns.histplot(x='A', hue='B', data=df_grouped, ax=ax, multiple='stack', weights='count')

and results in the following image:

enter image description here

What I would like is to order the plot based on the total counts of each value of A. I have tried different methods, but I am not able to get a successful result.

Edit

I found a way to do what I wanted.

What I did, is to calculate the total counts by df['A'] values:

df['total_count'] = df.groupby(by='A')['count'].transform('sum')
df = df.sort_values(by=['total_count'], ascending=False)

Then, by using the same plot code from above, I got the desired result.

The answer is similar to what Redox proposed.

In any case, I will try the other options proposed.

CodePudding user response：

To be clear, the visualization is a stacked bar chart, it's not a histogram, as a histrogram represents the distribution of continuous values, while this is the counts of discrete categorical values.
This answer starts with the raw dataframe, not the dataframe created with .groupby.

The easiest way to do this is create a frequency table of the raw dataframe using

Data Views

`df`

     A    B
0  A_5  B_5
1  A_3  B_1
2  A_4  B_5
3  A_3  B_4
4  A_3  B_5

`dfc`

B    B_1  B_2  B_3  B_4  B_5  tot_A
A                                  
A_1   86  131   15   55   90    377
A_3   47   90    9   33   61    240
A_5   37   83   13   33   56    222
A_4   43   65    9   27   50    194
A_2   16   21    1    5   24     67

CodePudding user response：

You can use below code... you need to get the order that you want the graph sorted (A_1, A_3, ...), then sort your dataframe df_grouped before plotting.

#Groupby A and sort by count - send the output as list to ord
my_ord = list(df_grouped.groupby(by=['A']).sum().sort_values('count', ascending = False).index)

# Order and sort the dataframe so that the final graph is also sorted
df_grouped['A'] = pd.Categorical(df_grouped['A'], categories=my_ord, ordered=True)
df_grouped = df_grouped.sort_values('A')

#Plot graph - no change here
fig, ax = plt.subplots(1, 1, figsize=(10,5))
sns.histplot(x='A', hue='B', data=df_grouped, ax=ax, multiple='stack', weights='count')

Output

A	B	count
A_1	B_1	10
A_1	B_2	51
A_1	B_3	25
A_1	B_4	12
A_1	B_5	2
A_2	B_1	19
A_2	B_3	5
A_3	B_5	18
A_3	B_4	33
A_3	B_5	44
A_4	B_1	29
A_5	B_2	32

A	B	count
A_1	B_1	10
A_1	B_2	51
A_1	B_3	25
A_1	B_4	12
A_1	B_5	2
A_2	B_1	19
A_2	B_3	5
A_3	B_5	18
A_3	B_4	33
A_3	B_5	44
A_4	B_1	29
A_5	B_2	32

A	B	count
A_1	B_1	10
A_1	B_2	51
A_1	B_3	25
A_1	B_4	12
A_1	B_5	2
A_2	B_1	19
A_2	B_3	5
A_3	B_5	18
A_3	B_4	33
A_3	B_5	44
A_4	B_1	29
A_5	B_2	32