I have a simple dataframe as the following:
n_obs = 3
dd = pd.DataFrame({
'WTL_exploded': [0, 1, 2]*n_obs,
'hazard': [0.3, 0.4, 0.5, 0.2, 0.8, 0.9, 0.6,0.6,0.65],
}, index=[1,1,1,2,2,2,3,3,3])
dd
I want to group by the index and get the cumulative product of the hazard
column. However, I want to multiply all but the last element of each group.
Desired output:
index | hazard |
---|---|
1 | 0.3 |
1 | 0.12 |
2 | 0.2 |
2 | 0.16 |
3 | 0.6 |
3 | 0.36 |
How can I do that?
CodePudding user response:
You can use:
out = dd.groupby(level=0, group_keys=False).apply(lambda x: x.cumprod().iloc[:-1])
Or:
out = dd.groupby(level=0).apply(lambda x: x.cumprod().iloc[:-1]).droplevel(1)
output:
WTL_exploded hazard
1 0 0.30
1 0 0.12
2 0 0.20
2 0 0.16
3 0 0.60
3 0 0.36
NB. you can also use lambda x: x.cumprod().head(-1)
.
CodePudding user response:
The solution I found is a bit intricate but works for the test case.
First, get rid of the last row of each group:
ff = dd.groupby(lambda x:x, as_index=False).apply(lambda x: x.iloc[:-1])
ff
Then, restore the original index, group-by again and use pandas cumprod
:
ff.reset_index().set_index('level_1').groupby(lambda x:x).cumprod()
Is there a more direct way?