Home > database >  Bar of proportion of two variables
Bar of proportion of two variables

Time:09-23

I am having a pandas dataframe as shown below

import numpy as np

data = {
'id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50],
'baseline': [1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1],
'endline': [1, 0, np.nan, 1, 0, 0, 1, np.nan, 1, 0, 0, 1, 0, 0, 1, 0, np.nan, np.nan, 1, 0, 1, np.nan, 0, 1, 0, 1, 0, np.nan, 1, 0, np.nan, 0, 0, 0, np.nan, 1, np.nan, 1, np.nan, 0, np.nan, 1, 1, 0, 1, 1, 1, 0, 1, 1]
}

df = pd.DataFrame(data)
df.head(n = 5)

The challenge is the endline column may have some missing values. My goal is to have 2 bars for each variable side by side as shown below.

enter image description here

Thanks in advance!

CodePudding user response:

Seaborn prefers its data in "long form". Pandas' melt can convert the given dataframe to combine the 'baseline' and 'endline' columns.

By default, sns.barplot shows the mean when there are multiple y-values belonging to the same x-value. You can use a different estimator, e.g. summing the values and dividing by the number of values to get a percentage.

Here is some code to get you started:

import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter
import seaborn as sns
import pandas as pd
import numpy as np

data = {
'id': range(1, 51),
'baseline': [1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1],
'endline': [1, 0, np.nan, 1, 0, 0, 1, np.nan, 1, 0, 0, 1, 0, 0, 1, 0, np.nan, np.nan, 1, 0, 1, np.nan, 0, 1, 0, 1, 0, np.nan, 1, 0, np.nan, 0, 0, 0, np.nan, 1, np.nan, 1, np.nan, 0, np.nan, 1, 1, 0, 1, 1, 1, 0, 1, 1]
}
df = pd.DataFrame(data)

sns.set_style('white')
ax = sns.barplot(data=df.melt(value_vars=['baseline', 'endline']),
                 x='variable', y='value',
                 estimator=lambda x: np.sum(x) / np.size(x) * 100, ci=None,
                 color='cornflowerblue')
ax.bar_label(ax.containers[0], fmt='%.1f %%', fontsize=20)

sns.despine(ax=ax, left=True)
ax.grid(True, axis='y')
ax.yaxis.set_major_formatter(PercentFormatter(100))
ax.set_xlabel('')
ax.set_ylabel('')
plt.tight_layout()
plt.show()

sns.barplot with percentages

  • Related