I am analyzing readings from my continuous glucose monitor as a diabetic. I have a spreadsheet where I've logged the date/time, the type of entry, and my blood glucose level. Each row represents a new entry, and entries can be of various types, for example:
- DOSE_INSULIN (amount of units of insulin injected),
- NEW_SENSOR (recording that I swapped out CGM sensors), or
- TEXT (any text based information I wanted to note down to myself).
What I am trying to do is to create a new column that tracks the amount of time (in hours) elapsed between sensor swaps, so that I eventually can determine if there is a relationship between sensor age and BGL control/variability.
Here is an example table beforehand:
date | type |
---|---|
11/21/21 12:55AM | TEXT |
11/21/21 1:16AM | DOSE_INSULIN |
11/21/21 2:05AM | NEW_SENSOR |
11/21/21 2:12AM | DOSE_INSULIN |
11/21/21 2:34AM | DOSE_INSULIN |
11/21/21 2:44AM | NEW_SENSOR |
And here is what I would like it to look like afterwards:
date | type | hours_since_new_sensor |
---|---|---|
11/21/21 12:55AM | TEXT | NaN |
11/21/21 1:16AM | DOSE_INSULIN | NaN |
11/21/21 2:05AM | NEW_SENSOR | 0 |
11/21/21 2:12AM | DOSE_INSULIN | 0.12 |
11/21/21 2:34AM | DOSE_INSULIN | 0.48 |
11/21/21 2:44AM | NEW_SENSOR | 0 |
There are a few other stackoverflow pages that I've found with similar questions, but I am having trouble adapting them to my specific issue.
- This one creates the new columns based on two separate groups.
- This one uses grouper, but also groups by an ID, which is confusing me.
So far, I have only been able to get to this line:
df['date'].where(df['type'] == 'NEW_SENSOR')
Which I understand will output a series(?) of null values, except for where the type is NEW_SENSOR, in which case it will output the date for that event. I can't figure out how to expand on this to get what I really want though.
Any help or advice would be greatly appreciated, thank you so much!
CodePudding user response:
Try with groupby
:
df["date"] = pd.to_datetime(df["date"])
df["hours_since_new_sensor"] = df["date"] - df.groupby(df["type"].eq("NEW_SENSOR").cumsum())["date"].transform("min")
#reset the value before the first NEW_SENSOR to null
df["hours_since_new_sensor"] = df["hours_since_new_sensor"].where(df["type"].eq("NEW_SENSOR").cumsum()>0)
>>> df
date type hours_since_new_sensor
0 2021-11-21 00:55:00 TEXT NaT
1 2021-11-21 01:16:00 DOSE_INSULIN NaT
2 2021-11-21 02:05:00 NEW_SENSOR 0 days 00:00:00
3 2021-11-21 02:12:00 DOSE_INSULIN 0 days 00:07:00
4 2021-11-21 02:34:00 DOSE_INSULIN 0 days 00:29:00
5 2021-11-21 02:44:00 NEW_SENSOR 0 days 00:00:00
If you would like to change the time to hours, you can do:
df["hours_since_new_sensor"] = df["hours_since_new_sensor"].dt.total_seconds().div(3600)
>>> df
date type hours_since_new_sensor
0 2021-11-21 00:55:00 TEXT NaN
1 2021-11-21 01:16:00 DOSE_INSULIN NaN
2 2021-11-21 02:05:00 NEW_SENSOR 0.000000
3 2021-11-21 02:12:00 DOSE_INSULIN 0.116667
4 2021-11-21 02:34:00 DOSE_INSULIN 0.483333
5 2021-11-21 02:44:00 NEW_SENSOR 0.000000
CodePudding user response:
df["date"] = pd.to_datetime(df["date"])
g = df['type'].eq('NEW_SENSOR').cumsum()
df['hours_since_new_sensor'] = df.groupby(g)['date'].diff().fillna(pd.Timedelta(0)).dt.total_seconds().div(60*60).groupby(g).cumsum().round(2)
Output:
>>> df
date type hours_since_new_sensor
0 2021-11-21 00:55:00 TEXT 0.00
1 2021-11-21 01:16:00 DOSE_INSULIN 0.35
2 2021-11-21 02:05:00 NEW_SENSOR 0.00
3 2021-11-21 02:12:00 DOSE_INSULIN 0.12
4 2021-11-21 02:34:00 DOSE_INSULIN 0.48
5 2021-11-21 02:44:00 NEW_SENSOR 0.00