How to fill the missing dates for multiple columns combination with NaN value-CodePudding

My df looks like the following:

ds          | col1 | col2 |col3 |values
01/01/2020.    x0.     y0.  z0.   12
01/02/2020.    x0.     y0.  z0.   11
01/03/2020.    x1.     y0.  z0.   14
01/02/2020.    x0.     y1.  z0.   19
01/03/2020.    x0.     y1.  z0.   11

wif a fixed start date= 01/01/2020 and end date=01/03/2020, me want to fill the missing dates value for each combinations of col1, col2, and col3. The output should be the following:

ds          | col1 | col2 |col3 |values
01/01/2020.    x0.     y0.  z0.   12
01/02/2020.    x0.     y0.  z0.   11
01/03/2020.    x0.     y0.  z0.   NaN
01/01/2020.    x1.     y0.  z0.   Nan
01/02/2020.    x1.     y0.  z0.   Nan
01/03/2020.    x1.     y0.  z0.   14
01/01/2020.    x0.     y1.  z0.   Nan
01/02/2020.    x0.     y1.  z0.   19
01/03/2020.    x0.     y1.  z0.   11

CodePudding user response：

Try:

# ensure datetime:
df["ds"] = pd.to_datetime(df["ds"], dayfirst=True)

dr = pd.date_range("2020-01-01", "2020-03-01", freq="MS")


def reindex(df, cols_to_fill=("col1", "col2", "col3")):
    df = df.set_index("ds").reindex(dr)
    df.loc[:, cols_to_fill] = df.loc[:, cols_to_fill].ffill().bfill()
    return df.reset_index().rename(columns={"index": "ds"})


df = (
    df.groupby(["col1", "col2", "col3"], sort=False, group_keys=False)
    .apply(reindex)
    .reset_index(drop=True)
)
print(df)

Prints:

          ds col1 col2 col3  values
0 2020-01-01   x0   y0   z0    12.0
1 2020-02-01   x0   y0   z0    11.0
2 2020-03-01   x0   y0   z0     NaN
3 2020-01-01   x1   y0   z0     NaN
4 2020-02-01   x1   y0   z0     NaN
5 2020-03-01   x1   y0   z0    14.0
6 2020-01-01   x0   y1   z0     NaN
7 2020-02-01   x0   y1   z0    19.0
8 2020-03-01   x0   y1   z0    11.0