I have a data frame where I need to pair consecutive events within the same day and subtract the later from the former. Each has a timestamp and a date.
time date event score
0 2022-03-07 06:45:00 00:00 2022-03-07 light 80.066667
1 2022-03-07 18:12:00 00:00 2022-03-07 dark 79.857667
2 2022-03-30 06:25:00 00:00 2022-03-30 light 107.060833
3 2022-03-30 13:38:00 00:00 2022-03-30 dark 105.324000
4 2022-03-30 13:40:00 00:00 2022-03-30 dark 105.239750
5 2022-03-30 15:47:00 00:00 2022-03-30 light 106.863143
6 2022-04-01 06:25:00 00:00 2022-04-01 light 101.271867
I have tried spreading the data frame using
df = df.pivot(index='time', columns='event', values='score')
event light dark
time
2022-03-07 06:45:00 00:00 80.066667 NaN
2022-03-07 18:12:00 00:00 NaN 79.857667
2022-03-30 06:25:00 00:00 107.060833 NaN
2022-03-30 13:38:00 00:00 NaN 105.324000
2022-03-30 13:40:00 00:00 NaN 105.239750
2022-03-30 15:47:00 00:00 106.863143 NaN
2022-04-01 06:25:00 00:00 101.271867 NaN
however because the events happen at different times the spread data frame has NaNs. I would ideally end up with this, where I keep the time of the first occurrence in the pair (light or dark), align the events (note: corresponding dark to match light as not yet occurred for 2022-04-01) and when light comes first I subtract the later value from the former and when dark comes first I subtract the former value from the later.
event light dark diff
time
2022-03-07 06:45:00 00:00 80.066667 79.857667 -0.208999
2022-03-30 06:25:00 00:00 107.060833 105.324000 -1.7368
2022-03-30 13:40:00 00:00 106.863143 105.239750 -1.6233
2022-04-01 06:25:00 00:00 101.271867 NaN NaN
CodePudding user response:
Here's one way using groupby
cumcount
to create groups, then use that groups in groupby
first
to get the first time each event happens each day. Then pivot
.
Finally, use diff
to get the difference between "light" and "dark" and assign
the differences to column "diff" in df
:
out = (df.assign(time=df.groupby(df.groupby('event').cumcount())['time'].transform('first'))
.pivot('time', 'event', 'score').reset_index().rename_axis([None], axis=1)
.assign(diff=lambda x: x['dark']-x['light']))
Output:
time dark light diff
0 2022-03-07 06:45:00 00:00 79.857667 80.066667 -0.209000
1 2022-03-30 06:25:00 00:00 105.324000 107.060833 -1.736833
2 2022-03-30 13:40:00 00:00 105.239750 106.863143 -1.623393
3 2022-04-01 06:25:00 00:00 NaN 101.271867 NaN