I have the following dataframe:
df = pd.DataFrame(
{
'status': ['open', 'closed', 'open', 'closed', 'open', 'closed', 'open', 'closed'],
'month': ['January 2020', 'January 2020', 'February 2020', 'February 2020', 'April 2020', 'April 2020', 'August 2020', 'August 2020'],
'counts': [10, 12, 32, 12, 19, 40, 10, 11]
}
)
status month counts
0 open January 2020 10
1 closed January 2020 12
2 open February 2020 32
3 closed February 2020 12
4 open April 2020 19
5 closed April 2020 40
6 open August 2020 10
7 closed August 2020 11
I'm trying to get a stacked bar plot using seaborn:
sns.histplot(df, x='month', weights='counts', hue='status', multiple='stack')
The purpose is to get a plot with a continuous timeseries without missing months. How can I fill in the missing rows with values so that the dataframe would look like below?
status month counts
open January 2020 10
closed January 2020 12
open February 2020 32
closed February 2020 12
open March 2020 0
closed March 2020 0
open April 2020 19
closed April 2020 40
open May 2020 0
closed May 2020 0
open June 2020 0
closed June 2020 0
open July 2020 0
closed July 2020 0
open August 2020 10
closed August 2020 11
CodePudding user response:
You could pivot the dataframe, and then reindex with the desired months.
import pandas as pd
df = pd.DataFrame({'status': ['open', 'closed', 'open', 'closed', 'open', 'closed', 'open', 'closed'],
'month': ['January 2020', 'January 2020', 'February 2020', 'February 2020', 'April 2020', 'April 2020', 'August 2020', 'August 2020'],
'counts': [10, 12, 32, 12, 19, 40, 10, 11]})
months = [f'{m} 2020' for m in ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August']]
df_pivoted = df.pivot(values='counts', index='month', columns='status').reindex(months).fillna(0)
ax = df_pivoted.plot.bar(stacked=True, width=1, ec='black', rot=0, figsize=(12, 5))
A seaborn solution, could use order=
. That doesn't work with a histplot
, only with a barplot
, which doesn't stack bars.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
df = pd.DataFrame({'status': ['open', 'closed', 'open', 'closed', 'open', 'closed', 'open', 'closed'],
'month': ['January 2020', 'January 2020', 'February 2020', 'February 2020', 'April 2020', 'April 2020', 'August 2020', 'August 2020'],
'counts': [10, 12, 32, 12, 19, 40, 10, 11]})
plt.figure(figsize=(12, 5))
months = [f'{m} 2020' for m in ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August']]
ax = sns.barplot(data=df, x='month', y='counts', hue='status', order=months)
plt.tight_layout()
plt.show()