100% Stacked and grouped barplots using matplotlib-CodePudding

Sorry if this question is duplicated as I am not able to find a solution.

I do have a data frame:

| sample_ids | perc_A | perc_B | perc_C |
|------------|--------|--------|--------|
| sample 1   | 0.75   | 0.18182| 0.42222|
| sample 2   | 0.66667| 0.24747| 0.15823|
| sample 3   | 0.70213| 0.28176| 0.17925|

With this, I would like to plot a 100% stacked and grouped bar chart (as shown below; a similar image taken from GitHub).

Detailed explanation based on the provided figure:
Let's say sample 1 is Apples, for bar A, 75% will be in dark purple (legend: True_perc_a) while 25% will be in light purple (legend: False_perc_a); for bar B, 18.19% will be in dark green (legend: True_perc_b) while 81.81% will be in light green (legend: False_perc_b); for bar C, 42.22% will be in dark yellow (legend: True_perc_c) while 57.78% will be in light yellow (legend: False_perc_c). The same conditions apply to sample 2 and sample 3.

I was able to process the data to get the true and false perc. For example:

df['perc_A'] = (df['perc_A']*100).round(2)
df['perc_F_A'] = (100 - df['perc_A']).round(2)

However, I have some difficulties to plot the figure.

CodePudding user response：

Because we know that the total percent will be 100%, we can just set the "False" value to 1. Then, we can melt the dataframe on the sample_ids column, rename the columns, and multiply everything by 100 (to make them percents). From here, we will grab the "false" percents by choosing the values in the percs column that contains an F and then graph it with Seaborn so that we can set the hue to the perc name. Set the palette to whatever colors you want and then set the alpha to 0.5 to make the difference between the true and false percents more apparent. Then graph the true percents after (this places them in front of the false percent bars) and you have your stacked bar plot:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (20,10)

df = pd.DataFrame({'sample_ids':['sample1', 'sample2', 'sample3'], 
                   'perc_A':[0.75,0.66667,0.70213],
                   'perc_B':[0.18182,0.24747,0.28176],
                   'perc_C':[0.4222,0.15823,0.17925]})

df[['perc_F_A', 'perc_F_B', 'perc_F_C']] = 1

meltedDF = df.melt(id_vars=['sample_ids'], var_name='perc', value_name='percent')
meltedDF['percent']=meltedDF['percent']*100

sns.barplot(data=meltedDF[meltedDF.perc.str.contains('F')], x='sample_ids', y='percent', hue='perc', palette=['blue','green','red'], alpha=0.5)
sns.barplot(data=meltedDF[~meltedDF.perc.str.contains('F')], x='sample_ids', y='percent', hue='perc', palette=['blue','green','red'])
plt.show()

Graph:

As an aside, a better way to generate your 'False" percents if you do want their true value would be to do this:

df = pd.DataFrame({'sample_ids':['sample1', 'sample2', 'sample3'], 
                   'perc_A':[0.75,0.66667,0.70213],
                   'perc_B':[0.18182,0.24747,0.28176],
                   'perc_C':[0.4222,0.15823,0.17925]})

df[['perc_F_A', 'perc_F_B', 'perc_F_C']] = df.groupby('sample_ids').apply(lambda x: 1-x)

Output:


   sample_ids   perc_A   perc_B     perc_C   perc_F_A   perc_F_B    perc_F_C
0   sample1     0.75000  0.18182    0.42220  0.25000    0.81818  0.57780
1   sample2     0.66667  0.24747    0.15823  0.33333    0.75253  0.84177
2   sample3     0.70213  0.28176    0.17925  0.29787    0.71824  0.82075