My df looks like the following:
ds | col1 | col2 |col3 |values
01/01/2020. x0. y0. z0. 12
01/02/2020. x0. y0. z0. 11
01/03/2020. x1. y0. z0. 14
01/02/2020. x0. y1. z0. 19
01/03/2020. x0. y1. z0. 11
wif a fixed start date= 01/01/2020 and end date=01/03/2020, me want to fill the missing dates value for each combinations of col1, col2, and col3. The output should be the following:
ds | col1 | col2 |col3 |values
01/01/2020. x0. y0. z0. 12
01/02/2020. x0. y0. z0. 11
01/03/2020. x0. y0. z0. NaN
01/01/2020. x1. y0. z0. Nan
01/02/2020. x1. y0. z0. Nan
01/03/2020. x1. y0. z0. 14
01/01/2020. x0. y1. z0. Nan
01/02/2020. x0. y1. z0. 19
01/03/2020. x0. y1. z0. 11
CodePudding user response:
Try:
# ensure datetime:
df["ds"] = pd.to_datetime(df["ds"], dayfirst=True)
dr = pd.date_range("2020-01-01", "2020-03-01", freq="MS")
def reindex(df, cols_to_fill=("col1", "col2", "col3")):
df = df.set_index("ds").reindex(dr)
df.loc[:, cols_to_fill] = df.loc[:, cols_to_fill].ffill().bfill()
return df.reset_index().rename(columns={"index": "ds"})
df = (
df.groupby(["col1", "col2", "col3"], sort=False, group_keys=False)
.apply(reindex)
.reset_index(drop=True)
)
print(df)
Prints:
ds col1 col2 col3 values
0 2020-01-01 x0 y0 z0 12.0
1 2020-02-01 x0 y0 z0 11.0
2 2020-03-01 x0 y0 z0 NaN
3 2020-01-01 x1 y0 z0 NaN
4 2020-02-01 x1 y0 z0 NaN
5 2020-03-01 x1 y0 z0 14.0
6 2020-01-01 x0 y1 z0 NaN
7 2020-02-01 x0 y1 z0 19.0
8 2020-03-01 x0 y1 z0 11.0