Fill missing cohort indices-CodePudding

I have a data frame like that

week	Revenue	Cohort_index
19/09/2021	120	0
19/09/2021	150	1
19/09/2021	223	2
19/09/2021	256	4
20/09/2021	340	0
20/09/2021	126	1
20/09/2021	234	2

Now I'd like to check if cohort_index is missing (3 in this case for date 19/09/2021 & also 3 for 20/09/2021) , then insert a new row with the missing index.

Maximum number of indices will be decreasing so let's say for 19/09/2021 the maximum number of cohort index is 4 , so the next date 20/09/2021 will have 3 indices.. and I need to fill all missing indices from the minimum to maximum

rest of column values are copied from the previous row except Revenue that will be filled with 0 while updating the data frame index.

Data is more granular than what I have posted , so for every date and every cohort_index I have different countries and different device types.

Desired Output :

week	Revenue	Cohort_index
19/09/2021	120	0
19/09/2021	150	1
19/09/2021	223	2
19/09/2021	0	3
19/09/2021	256	4
20/09/2021	340	0
20/09/2021	126	1
20/09/2021	234	2
20/09/2021	0	3

I think a For loop is needed , but I can't get my head around it.

CodePudding user response：

A bit more tricky than I'd like but it should work. Could be simplified a little if there was a way to get current group number in groupby apply, is there?

columns = df.columns
max_cohort = 4
df["ngroup"] = df.groupby('week').ngroup()

out = (
 df.groupby('week', as_index=False)
   .apply(lambda g: g.set_index('Cohort_index')
                     .reindex(range(max_cohort   1 - max(g["ngroup"])))
   )).droplevel(0).reset_index()[columns]

out["week"] = out["week"].fillna(method="ffill")
out["Revenue"] = out["Revenue"].fillna(0)

results is

          week  Revenue  Cohort_index
0  19/09/2021     120.0             0
1  19/09/2021     150.0             1
2  19/09/2021     223.0             2
3  19/09/2021       0.0             3
4  19/09/2021     256.0             4
5  20/09/2021     340.0             0
6  20/09/2021     126.0             1
7  20/09/2021     234.0             2
8  20/09/2021       0.0             3