How to calculate time elapsed since an event occurred in a specific column

I am analyzing readings from my continuous glucose monitor as a diabetic. I have a spreadsheet where I've logged the date/time, the type of entry, and my blood glucose level. Each row represents a new entry, and entries can be of various types, for example:

DOSE_INSULIN (amount of units of insulin injected),
NEW_SENSOR (recording that I swapped out CGM sensors), or
TEXT (any text based information I wanted to note down to myself).

What I am trying to do is to create a new column that tracks the amount of time (in hours) elapsed between sensor swaps, so that I eventually can determine if there is a relationship between sensor age and BGL control/variability.

Here is an example table beforehand:

date	type
11/21/21 12:55AM	TEXT
11/21/21 1:16AM	DOSE_INSULIN
11/21/21 2:05AM	NEW_SENSOR
11/21/21 2:12AM	DOSE_INSULIN
11/21/21 2:34AM	DOSE_INSULIN
11/21/21 2:44AM	NEW_SENSOR

And here is what I would like it to look like afterwards:

date	type	hours_since_new_sensor
11/21/21 12:55AM	TEXT	NaN
11/21/21 1:16AM	DOSE_INSULIN	NaN
11/21/21 2:05AM	NEW_SENSOR	0
11/21/21 2:12AM	DOSE_INSULIN	0.12
11/21/21 2:34AM	DOSE_INSULIN	0.48
11/21/21 2:44AM	NEW_SENSOR	0

There are a few other stackoverflow pages that I've found with similar questions, but I am having trouble adapting them to my specific issue.

This one creates the new columns based on two separate groups.
This one uses grouper, but also groups by an ID, which is confusing me.

So far, I have only been able to get to this line:

df['date'].where(df['type'] == 'NEW_SENSOR')

Which I understand will output a series(?) of null values, except for where the type is NEW_SENSOR, in which case it will output the date for that event. I can't figure out how to expand on this to get what I really want though.

Any help or advice would be greatly appreciated, thank you so much!

CodePudding user response：

Try with groupby:

df["date"] = pd.to_datetime(df["date"])
df["hours_since_new_sensor"] = df["date"] - df.groupby(df["type"].eq("NEW_SENSOR").cumsum())["date"].transform("min")
#reset the value before the first NEW_SENSOR to null
df["hours_since_new_sensor"] = df["hours_since_new_sensor"].where(df["type"].eq("NEW_SENSOR").cumsum()>0)

>>> df
                 date          type hours_since_new_sensor
0 2021-11-21 00:55:00          TEXT                    NaT
1 2021-11-21 01:16:00  DOSE_INSULIN                    NaT
2 2021-11-21 02:05:00    NEW_SENSOR        0 days 00:00:00
3 2021-11-21 02:12:00  DOSE_INSULIN        0 days 00:07:00
4 2021-11-21 02:34:00  DOSE_INSULIN        0 days 00:29:00
5 2021-11-21 02:44:00    NEW_SENSOR        0 days 00:00:00

If you would like to change the time to hours, you can do:

df["hours_since_new_sensor"] = df["hours_since_new_sensor"].dt.total_seconds().div(3600)

>>> df
                 date          type  hours_since_new_sensor
0 2021-11-21 00:55:00          TEXT                     NaN
1 2021-11-21 01:16:00  DOSE_INSULIN                     NaN
2 2021-11-21 02:05:00    NEW_SENSOR                0.000000
3 2021-11-21 02:12:00  DOSE_INSULIN                0.116667
4 2021-11-21 02:34:00  DOSE_INSULIN                0.483333
5 2021-11-21 02:44:00    NEW_SENSOR                0.000000

CodePudding user response：

df["date"] = pd.to_datetime(df["date"])

g = df['type'].eq('NEW_SENSOR').cumsum()
df['hours_since_new_sensor'] = df.groupby(g)['date'].diff().fillna(pd.Timedelta(0)).dt.total_seconds().div(60*60).groupby(g).cumsum().round(2)

Output:

>>> df
                 date          type  hours_since_new_sensor
0 2021-11-21 00:55:00          TEXT                    0.00
1 2021-11-21 01:16:00  DOSE_INSULIN                    0.35
2 2021-11-21 02:05:00    NEW_SENSOR                    0.00
3 2021-11-21 02:12:00  DOSE_INSULIN                    0.12
4 2021-11-21 02:34:00  DOSE_INSULIN                    0.48
5 2021-11-21 02:44:00    NEW_SENSOR                    0.00