I need to fill in missing dates in a pandas
data frame. The dataframe consists of weekly sales data for multiple items. I am looping through each item to see if there are missing weeks of dates with the intention of filling in those dates with a '0' for sales and all other information copied down.
I use the following code to find the missing dates:
pd.date_range(start="2017-01-13", end="2022-12-16", freq = "W-SAT").difference(df_['week_date'])
While I can print the missing dates and search manually for the few items that are missing sales weeks, I have not found a way to do this programmatically.
I tried
for item in df['ord_base7'].unique():
df_ = df[df['ord_base7'] == item]
if pd.date_range(start="2017-01-13", end="2022-12-16", freq = "W-SAT").difference(df_['week_date']).isnan() == True:
pass
else:
print(item, pd.date_range(start="2017-01-13", end="2022-12-16", freq = "W-SAT").difference(df_['week_date']))
That yielded the error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/tmp/ipykernel_55320/2582723605.py in <module>
1 for item in df['ord_base7'].unique():
2 df_ = df[df['ord_base7'] == item]
----> 3 if pd.date_range(start="2017-01-13", end="2022-12-16", freq = "W-SAT").difference(df_['week_date']).isnan() == True:
4 pass
5 else:
AttributeError: 'DatetimeIndex' object has no attribute 'isnan'
How can I program a way to see if there are no dates missing so those items can be passed over?
CodePudding user response:
Looping on a pandas dataframe is not a good idea because it's inefficient. Just use the .fillna()
and pass in whatever value you want to be set instead of NaN:
df['week_date'].fillna(0)
CodePudding user response:
Nevermind... I just tried the following and it worked.
for item in df['ord_base7'].unique():
df_ = df[df['ord_base7'] == item]
if pd.date_range(start="2017-01-13", end="2022-12-16", freq = "W-SAT").difference(df_['week_date']).empty == True:
pass
else:
print(item, pd.date_range(start="2017-01-13", end="2022-12-16", freq = "W-SAT").difference(df_['week_date']))
The .empty
is how to do this with a date time index.