Very standard task: data can have multiple events on a given date or no events. I to take these events, enrich the in the data missing dates in given range and zero values, so later I can a count by date and plot the data.
I'm new to pandas and have followed reindex to add missing dates to pandas dataframe and pandas fill missing dates in time series. Although it was accepted by a number of stackoverflow readers, it does not work for me, as it Nullifies already existing 'quantity' column values:
CODE:
# (items_per_day is coming from DB)
date_now = datetime.now().date()
date_period = date_now-timedelta(weeks=1)
data = list(items_per_day)
print('data: ', data)
idx = pd.period_range(date_period, date_now)
df = pd.DataFrame(data)
print('df: ', df)
df.index = pd.DatetimeIndex(df.index)
print('df index: ', df)
df = df.reindex(idx, fill_value=0)
print('df: ', df)
OUTPUT:
data: [{'date': datetime.date(2022, 8, 17), 'quantity': 1}, {'date': datetime.date(2022, 8, 18), 'quantity': 2}, {'date': datetime.date(2022, 8, 19), 'quantity': 3}]
df: date quantity
0 2022-08-17 1
1 2022-08-18 2
2 2022-08-19 3
df index: date quantity
1970-01-01 00:00:00.000000000 2022-08-17 1
1970-01-01 00:00:00.000000001 2022-08-18 2
1970-01-01 00:00:00.000000002 2022-08-19 3
df: date quantity
2022-08-13 0 0
2022-08-14 0 0
2022-08-15 0 0
2022-08-16 0 0
2022-08-17 0 0
2022-08-18 0 0
2022-08-19 0 0
2022-08-20 0 0
It looks like that reindex losses the data with or without 'fill_value' param! Although Add missing dates pandas dataframe solution based on different principal works:
CODE:
date_now = datetime.now().date()
date_period = date_now-timedelta(weeks=1)
data = list(items_per_day)
print('data: ', data)
idx = pd.period_range(date_period, date_now)
df = pd.DataFrame(data)
print('df: ', df)
df['date'] = pd.to_datetime(df['date'])
dates = pd.date_range(start=date_period, freq='D', periods=7)
df = df.set_index('date').reindex(dates.normalize().rename('date')).reset_index()
df['quantity'] = df['quantity'].fillna(0)
print('df: ', df)
OUTPUT:
data: [{'date': datetime.date(2022, 8, 17), 'quantity': 1}, {'date': datetime.date(2022, 8, 18), 'quantity': 2}, {'date': datetime.date(2022, 8, 19), 'quantity': 3}]
df: date quantity
0 2022-08-17 1
1 2022-08-18 2
2 2022-08-19 3
df: date quantity
0 2022-08-13 0.0
1 2022-08-14 0.0
2 2022-08-15 0.0
3 2022-08-16 0.0
4 2022-08-17 1.0
5 2022-08-18 2.0
6 2022-08-19 3.0
I would like to understand why the first solution does not work in my case (it obviously worked for a number of people who voted to it!). What is the root cause of the failure in my case?
CodePudding user response:
In your first approach, you are reindexing a DatetimeIndex
with a PeriodIndex
(created by period_range), use date_range
instead of period_range
works:
idx = pd.date_range(date_period, date_now)
df.index = pd.DatetimeIndex(df.date)
df.reindex(idx, fill_value=0)
# date quantity
#2022-08-13 0 0
#2022-08-14 0 0
#2022-08-15 0 0
#2022-08-16 0 0
#2022-08-17 2022-08-17 1
#2022-08-18 2022-08-18 2
#2022-08-19 2022-08-19 3
#2022-08-20 0 0