I have a dataframe that contains multiple time series, like so:
Date | Item | Category |
---|---|---|
2021-01-01 | gouda | cheese |
2021-01-02 | gouda | cheese |
2021-01-04 | gouda | cheese |
2021-01-05 | gouda | cheese |
2021-02-01 | lettuce | produce |
2021-02-02 | lettuce | produce |
2021-02-03 | lettuce | produce |
2021-02-05 | lettuce | produce |
I'd like to add rows for the missing dates (ex. 2021-01-03 for gouda, 2021-02-04 for lettuce). Note that these series do not necessarily start and end on the same date.
What is the best way to do this in pandas? I'd also like fill the new rows with the values in the "item" and "category" column for that series.
CodePudding user response:
Group by Item
and Category
, then generate a time series from the min to the max date:
result = (
df.groupby(["Item", "Category"])["Date"]
.apply(lambda s: pd.date_range(s.min(), s.max()))
.explode()
.reset_index()
)
CodePudding user response:
You can do resample
df['Date'] = pd.to_datetime(df['Date'])
df['Y-m'] = df['Date'].dt.strftime('%y-%m')
out = df.groupby('Y-m').apply(lambda x : x.set_index('Date').resample('D').ffill()).reset_index(level=1)
CodePudding user response:
This is far from optimal, but it is how I would do in order to ensure all categories and items are within the min and max periods, and all ranges are filled:
aux = []
for x in df['Item'].unique():
_ = pd.DataFrame({'Date':pd.date_range(df[df['Item']==x]['Date'].min(),df[df['Item']==x]['Date'].max(),freq='d')})
_['Item'] = x
_['Category'] = df[df['Item']==x]['Category'].values[0]
aux.append(_)
output = pd.concat(aux)
Consider this sample dataset:
df = pd.DataFrame({'Date':['2021-01-01','2021-01-02','2021-01-04','2021-01-05','2021-01-01','2021-01-02','2021-01-04','2021-01-05'],
'Item':['gouda','gouda','gouda','gouda','lettuce','lettuce','lettuce','lettuce'],
'Category':['cheese','cheese','cheese','cheese','produce','produce','produce','produce']})
df['Date'] = pd.to_datetime(df['Date'],infer_datetime_format=True)
Outputs:
Date Item Category
0 2021-01-01 gouda cheese
1 2021-01-02 gouda cheese
2 2021-01-03 gouda cheese
3 2021-01-04 gouda cheese
4 2021-01-05 gouda cheese
0 2021-01-01 lettuce produce
1 2021-01-02 lettuce produce
2 2021-01-03 lettuce produce
3 2021-01-04 lettuce produce
4 2021-01-05 lettuce produce