i work with customers consumptions and sometime didn't have this consumption for month or more so the first consumption after that need to break it down into those months example
df = pd.DataFrame({'customerId':[1,1,1,1,1,1,1,2,2,2,2,2,2,2],
'month':['2021-10-01','2021-11-01','2021-12-01','2022-01-01','2022-02-01','2022-03-01','2022-04-01','2021-10-01','2021-11-01','2021-12-01','2022-01-01','2022-02-01','2022-03-01','2022-04-01'],
'consumption':[100,130,0,0,400,140,105,500,0,0,0,0,0,3300]})
bfill() return same value not mean (value/count of null 1) desired value
'c':[100,130,133,133,133,140,105,500,550,550,550,550,550,550]
CodePudding user response:
You can try something like this:
df = pd.DataFrame({'customerId':[1,1,1,1,1,1,1,2,2,2,2,2,2,2],
'month':['2021-10-01','2021-11-01','2021-12-01','2022-01-01','2022-02-01','2022-03-01','2022-04-01','2021-10-01','2021-11-01','2021-12-01','2022-01-01','2022-02-01','2022-03-01','2022-04-01'],
'consumption':[100,130,0,0,400,140,105,500,0,0,0,0,0,3300]})
df['grp'] = df['consumption'].ne(0)[::-1].cumsum()
df['c'] = df.groupby(['customerId', 'grp'])['consumption'].transform('mean')
df
Output:
customerId month consumption grp c
0 1 2021-10-01 100 7 100.000000
1 1 2021-11-01 130 6 130.000000
2 1 2021-12-01 0 5 133.333333
3 1 2022-01-01 0 5 133.333333
4 1 2022-02-01 400 5 133.333333
5 1 2022-03-01 140 4 140.000000
6 1 2022-04-01 105 3 105.000000
7 2 2021-10-01 500 2 500.000000
8 2 2021-11-01 0 1 550.000000
9 2 2021-12-01 0 1 550.000000
10 2 2022-01-01 0 1 550.000000
11 2 2022-02-01 0 1 550.000000
12 2 2022-03-01 0 1 550.000000
13 2 2022-04-01 3300 1 550.000000
Details:
- Create a group by checking for zero, the do a cumsum in reverse order to group zeroes with the next non-zero value.
- Groupby that group and transform mean to distribute that non-zero value across zeroes.