I have a dataframe df_counts
that contains the number of events that happen on a given day.
My goal is to fill in all the missing dates, and assign them a count of 0.
date count
0 2012-03-14 8
1 2012-03-19 1
2 2012-04-07 3
3 2012-04-10 1
4 2012-04-19 5
Desired output:
date count
0 2012-03-14 8
1 2012-03-15 0
2 2012-03-16 0
3 2012-03-17 0
4 2012-03-18 0
5 2012-03-19 0
6 2012-03-20 0
7 2012-03-21 0
8 2012-03-22 0
9 2012-03-23 0
...
Links I've read through: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.reindex.html
https://datatofish.com/pandas-dataframe-to-series/
Add missing dates to pandas dataframe
What I've tried:
idx = pd.date_range('2012-03-06', '2022-12-05')
s = df_counts.sequeeze
s.index = pd.DatetimeIndex(s.index)
s = s.reindex(idx, fill_value = 0)
s.head()
Output of what I've tried:
AttributeError: 'DataFrame' object has no attribute 'sequeeze'
CodePudding user response:
You can achieve this using the asfreq()
function.
# convert date column to datetime
df.date = pd.to_datetime(df.date)
# set date columns as index, drop the original index
df = df.set_index("date", drop=True)
# daily freq
df = df.asfreq("D", fill_value=0)
Some notes on your code:
squeeze()
is a function, not an attribute. Therefore you get an error there.- You can directly use the
date
column as an index using theset_index()
function.
CodePudding user response:
You can use resample
:
new_df = df.set_index('date').resample('1D').agg('first').fillna(0)
Output:
'''
date count
2012-03-14 8.0
2012-03-15 0.0
2012-03-16 0.0
2012-03-17 0.0
2012-03-18 0.0
2012-03-19 1.0
2012-03-20 0.0
2012-03-21 0.0
2012-03-22 0.0
2012-03-23 0.0
2012-03-24 0.0
2012-03-25 0.0
2012-03-26 0.0
2012-03-27 0.0
2012-03-28 0.0
2012-03-29 0.0
2012-03-30 0.0
2012-03-31 0.0
2012-04-01 0.0
2012-04-02 0.0
2012-04-03 0.0
2012-04-04 0.0
2012-04-05 0.0
2012-04-06 0.0
2012-04-07 3.0
2012-04-08 0.0
2012-04-09 0.0
2012-04-10 1.0
2012-04-11 0.0
2012-04-12 0.0
2012-04-13 0.0
2012-04-14 0.0
2012-04-15 0.0
2012-04-16 0.0
2012-04-17 0.0
2012-04-18 0.0
2012-04-19 5.0
'''
CodePudding user response:
My favorite is fillna
, you can use it for whole df or single column:
df["column"].fillna(0)