Home > Software engineering >  How can I fill in missing dates in a dataframe?
How can I fill in missing dates in a dataframe?

Time:12-12

I have a dataframe df_counts that contains the number of events that happen on a given day. My goal is to fill in all the missing dates, and assign them a count of 0.


       date    count
0   2012-03-14  8
1   2012-03-19  1
2   2012-04-07  3
3   2012-04-10  1
4   2012-04-19  5

Desired output:

       date    count
0   2012-03-14  8
1   2012-03-15  0
2   2012-03-16  0
3   2012-03-17  0
4   2012-03-18  0
5   2012-03-19  0
6   2012-03-20  0
7   2012-03-21  0
8   2012-03-22  0
9   2012-03-23  0
...

Links I've read through: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.reindex.html

https://datatofish.com/pandas-dataframe-to-series/

Add missing dates to pandas dataframe

What I've tried:

idx = pd.date_range('2012-03-06', '2022-12-05')
s = df_counts.sequeeze
s.index = pd.DatetimeIndex(s.index)
s = s.reindex(idx, fill_value = 0)
s.head()

Output of what I've tried:

AttributeError: 'DataFrame' object has no attribute 'sequeeze'

CodePudding user response:

You can achieve this using the asfreq() function.

# convert date column to datetime
df.date = pd.to_datetime(df.date)

# set date columns as index, drop the original index
df = df.set_index("date", drop=True)

# daily freq
df = df.asfreq("D", fill_value=0)

Some notes on your code:

  • squeeze() is a function, not an attribute. Therefore you get an error there.
  • You can directly use the date column as an index using the set_index() function.

CodePudding user response:

You can use resample:

new_df = df.set_index('date').resample('1D').agg('first').fillna(0)

Output:

'''
date    count
2012-03-14  8.0
2012-03-15  0.0
2012-03-16  0.0
2012-03-17  0.0
2012-03-18  0.0
2012-03-19  1.0
2012-03-20  0.0
2012-03-21  0.0
2012-03-22  0.0
2012-03-23  0.0
2012-03-24  0.0
2012-03-25  0.0
2012-03-26  0.0
2012-03-27  0.0
2012-03-28  0.0
2012-03-29  0.0
2012-03-30  0.0
2012-03-31  0.0
2012-04-01  0.0
2012-04-02  0.0
2012-04-03  0.0
2012-04-04  0.0
2012-04-05  0.0
2012-04-06  0.0
2012-04-07  3.0
2012-04-08  0.0
2012-04-09  0.0
2012-04-10  1.0
2012-04-11  0.0
2012-04-12  0.0
2012-04-13  0.0
2012-04-14  0.0
2012-04-15  0.0
2012-04-16  0.0
2012-04-17  0.0
2012-04-18  0.0
2012-04-19  5.0

'''

CodePudding user response:

My favorite is fillna, you can use it for whole df or single column:

df["column"].fillna(0)
  • Related