Home > OS >  Fill in missing values for missing dates in dataframe
Fill in missing values for missing dates in dataframe

Time:02-19

I have the following dataframe:

df = pd.DataFrame(
    {
        'status': ['open', 'closed', 'open', 'closed', 'open', 'closed', 'open', 'closed'],
        'month': ['January 2020', 'January 2020', 'February 2020', 'February 2020', 'April 2020', 'April 2020', 'August 2020', 'August 2020'],
        'counts': [10, 12, 32, 12, 19, 40, 10, 11]
    }
)
    status  month           counts
0   open    January 2020    10
1   closed  January 2020    12
2   open    February 2020   32
3   closed  February 2020   12
4   open    April 2020      19
5   closed  April 2020      40
6   open    August 2020     10
7   closed  August 2020     11

I'm trying to get a stacked bar plot using seaborn:

sns.histplot(df, x='month', weights='counts', hue='status', multiple='stack')

enter image description here

The purpose is to get a plot with a continuous timeseries without missing months. How can I fill in the missing rows with values so that the dataframe would look like below?

status  month           counts
open    January 2020    10
closed  January 2020    12
open    February 2020   32
closed  February 2020   12
open    March 2020      0
closed  March 2020      0
open    April 2020      19
closed  April 2020      40
open    May 2020        0
closed  May 2020        0
open    June 2020       0
closed  June 2020       0
open    July 2020       0
closed  July 2020       0
open    August 2020     10
closed  August 2020     11

CodePudding user response:

You could pivot the dataframe, and then reindex with the desired months.

import pandas as pd

df = pd.DataFrame({'status': ['open', 'closed', 'open', 'closed', 'open', 'closed', 'open', 'closed'],
                   'month': ['January 2020', 'January 2020', 'February 2020', 'February 2020', 'April 2020', 'April 2020', 'August 2020', 'August 2020'],
                   'counts': [10, 12, 32, 12, 19, 40, 10, 11]})

months = [f'{m} 2020' for m in ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August']]
df_pivoted = df.pivot(values='counts', index='month', columns='status').reindex(months).fillna(0)
ax = df_pivoted.plot.bar(stacked=True, width=1, ec='black', rot=0, figsize=(12, 5))

pandas bar plot using pivoted dataframe

A seaborn solution, could use order=. That doesn't work with a histplot, only with a barplot, which doesn't stack bars.

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

df = pd.DataFrame({'status': ['open', 'closed', 'open', 'closed', 'open', 'closed', 'open', 'closed'],
                   'month': ['January 2020', 'January 2020', 'February 2020', 'February 2020', 'April 2020', 'April 2020', 'August 2020', 'August 2020'],
                   'counts': [10, 12, 32, 12, 19, 40, 10, 11]})

plt.figure(figsize=(12, 5))
months = [f'{m} 2020' for m in ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August']]
ax = sns.barplot(data=df, x='month', y='counts', hue='status', order=months)
plt.tight_layout()
plt.show()

sns.barplot

  • Related