I have the following data:
id | encounter_key | datetime |
---|---|---|
1 | 111 | 2019-04-14 |
1 | 111 | 2019-04-14 |
1 | 111 | 2019-07-18 |
1 | 122 | 2019-09-02 |
2 | 211 | 2019-10-03 |
2 | 211 | 2020-10-03 |
I want to find the cumulative duration, grouped by id
and encounter_key
to achieve the following:
id | encounter_key | datetime | cum_duration_days |
---|---|---|---|
1 | 111 | 2019-04-14 | 0 |
1 | 111 | 2019-04-14 | 0 |
1 | 111 | 2019-07-18 | 95 |
1 | 122 | 2019-09-02 | 0 |
2 | 211 | 2019-10-03 | 0 |
2 | 211 | 2020-10-03 | 366 |
I've tried df.groupby(['datetime']).apply( ... )
etc. but nothing seems to work. Thanks in advance.
CodePudding user response:
I think this should work
df['cum_duration_days']=df.groupby(['id','encounter_key'])['datetime'].diff()/ np.timedelta64(1, 'D')
df['cum_duration_days'].fillna(0)
But @enke is right, the output desired seems to have an error on the '31' unless you are not showing all the rows for encounter_key=211 /id=2 ...