Home > other >  Cumulative product by group without groups' last row in pandas
Cumulative product by group without groups' last row in pandas

Time:10-27

I have a simple dataframe as the following:

n_obs = 3
dd = pd.DataFrame({
    'WTL_exploded': [0, 1, 2]*n_obs,
    'hazard': [0.3, 0.4, 0.5, 0.2, 0.8, 0.9, 0.6,0.6,0.65],
}, index=[1,1,1,2,2,2,3,3,3])
dd

I want to group by the index and get the cumulative product of the hazard column. However, I want to multiply all but the last element of each group.

Desired output:

index hazard
1 0.3
1 0.12
2 0.2
2 0.16
3 0.6
3 0.36

How can I do that?

CodePudding user response:

You can use:

out = dd.groupby(level=0, group_keys=False).apply(lambda x: x.cumprod().iloc[:-1])

Or:

out = dd.groupby(level=0).apply(lambda x: x.cumprod().iloc[:-1]).droplevel(1)

output:

   WTL_exploded  hazard
1             0    0.30
1             0    0.12
2             0    0.20
2             0    0.16
3             0    0.60
3             0    0.36

NB. you can also use lambda x: x.cumprod().head(-1).

CodePudding user response:

The solution I found is a bit intricate but works for the test case.

First, get rid of the last row of each group:

ff = dd.groupby(lambda x:x, as_index=False).apply(lambda x: x.iloc[:-1])
ff

Then, restore the original index, group-by again and use pandas cumprod:

ff.reset_index().set_index('level_1').groupby(lambda x:x).cumprod()

Is there a more direct way?

  • Related