I have a Dataframe that represents a hotel reservations by date:
date volume city
2020-01-05 10 NY
2020-01-06 10 NY
2020-01-07 30 NY
The Dataframe is ultimately being written to the Database and because of that I need to have a complete range of dates
from a given point in the past to a given point in the future, for example, the Dataframe (for entire 2020) I need should look like this:
date volume city
2020-01-01 0 NY
2020-01-02 0 NY
2020-01-03 0 NY
2020-01-04 0 NY
2020-01-05 10 NY
2020-01-06 10 NY
2020-01-07 30 NY
...
2020-12-31 0 NY
It's important that all the rows filling the range have a volume=0
and the city is repeated in the entire dataset.
How can I effective convert my Dataframe to fill dates missing in the range ?
CodePudding user response:
Use DataFrame.reindex
with date_range
with replace missing values to 0
, for column city
set NY
:
df['date'] = pd.to_datetime(df['date'])
r = pd.date_range('2020-01-01','2020-12-31')
df = df.set_index('date').reindex(r).fillna({'volume':0}).assign(city = 'NY')
print (df)
volume city
2020-01-01 0.0 NY
2020-01-02 0.0 NY
2020-01-03 0.0 NY
2020-01-04 0.0 NY
2020-01-05 10.0 NY
... ...
2020-12-27 0.0 NY
2020-12-28 0.0 NY
2020-12-29 0.0 NY
2020-12-30 0.0 NY
2020-12-31 0.0 NY
[366 rows x 2 columns]
If possible multiple cities and need date_range
for each city create MultiIndex.from_product
:
df['date'] = pd.to_datetime(df['date'])
r = pd.date_range('2020-01-01','2020-12-31')
mux = pd.MultiIndex.from_product([r, df['city'].unique()], names=['date','city'])
df = df.set_index(['date', 'city']).reindex(mux, fill_value=0).reset_index()
print (df)
date city volume
0 2020-01-01 NY 0
1 2020-01-02 NY 0
2 2020-01-03 NY 0
3 2020-01-04 NY 0
4 2020-01-05 NY 10
.. ... ... ...
361 2020-12-27 NY 0
362 2020-12-28 NY 0
363 2020-12-29 NY 0
364 2020-12-30 NY 0
365 2020-12-31 NY 0
[366 rows x 3 columns]