Home > Blockchain >  Ordering a stacked histplot based on total counts
Ordering a stacked histplot based on total counts

Time:06-17

I have a dataframe which results from:

df_grouped = df.groupby(['A', 'B'])['A'].count().sort_values(ascending=False)
df_grouped = pd.DataFrame(df_grouped)
new_index = pd.MultiIndex.from_tuples(df_grouped.index)
df_grouped.index = new_index
df_grouped.reset_index(inplace=True)
df_grouped.columns = ['A', 'B', 'count']

Then, df_grouped is something like:

A B count
A_1 B_1 10
A_1 B_2 51
A_1 B_3 25
A_1 B_4 12
A_1 B_5 2
A_2 B_1 19
A_2 B_3 5
A_3 B_5 18
A_3 B_4 33
A_3 B_5 44
A_4 B_1 29
A_5 B_2 32

I have plotted a seaborn.histplot using the following code:

fig, ax = plt.subplots(1, 1, figsize=(10,5))
sns.histplot(x='A', hue='B', data=df_grouped, ax=ax, multiple='stack', weights='count')

and results in the following image:

enter image description here

What I would like is to order the plot based on the total counts of each value of A. I have tried different methods, but I am not able to get a successful result.

Edit

I found a way to do what I wanted.

What I did, is to calculate the total counts by df['A'] values:

df['total_count'] = df.groupby(by='A')['count'].transform('sum')
df = df.sort_values(by=['total_count'], ascending=False)

Then, by using the same plot code from above, I got the desired result.

The answer is similar to what Redox proposed.

In any case, I will try the other options proposed.

CodePudding user response:

  • To be clear, the visualization is a stacked bar chart, it's not a histogram, as a histrogram represents the distribution of continuous values, while this is the counts of discrete categorical values.
  • This answer starts with the raw dataframe, not the dataframe created with .groupby.
  1. The easiest way to do this is create a frequency table of the raw dataframe using enter image description here

    Data Views

    df

         A    B
    0  A_5  B_5
    1  A_3  B_1
    2  A_4  B_5
    3  A_3  B_4
    4  A_3  B_5
    

    dfc

    B    B_1  B_2  B_3  B_4  B_5  tot_A
    A                                  
    A_1   86  131   15   55   90    377
    A_3   47   90    9   33   61    240
    A_5   37   83   13   33   56    222
    A_4   43   65    9   27   50    194
    A_2   16   21    1    5   24     67
    

    CodePudding user response:

    You can use below code... you need to get the order that you want the graph sorted (A_1, A_3, ...), then sort your dataframe df_grouped before plotting.

    #Groupby A and sort by count - send the output as list to ord
    my_ord = list(df_grouped.groupby(by=['A']).sum().sort_values('count', ascending = False).index)
    
    # Order and sort the dataframe so that the final graph is also sorted
    df_grouped['A'] = pd.Categorical(df_grouped['A'], categories=my_ord, ordered=True)
    df_grouped = df_grouped.sort_values('A')
    
    #Plot graph - no change here
    fig, ax = plt.subplots(1, 1, figsize=(10,5))
    sns.histplot(x='A', hue='B', data=df_grouped, ax=ax, multiple='stack', weights='count')
    

    Output

    enter image description here

  • Related