This is a follow-up question for this question.
To recapitulate: I have a dataframe data
like this:
> data
ID measurement_type measurement_time amount entry_time
1 1 type_1 2014-06-17 04:00:00 1 2014-06-17 01:53:00
2 1 type_1 2014-06-17 11:52:00 2 2014-06-17 01:53:00
3 1 type_1 2014-06-17 18:58:00 1 2014-06-17 01:53:00
4 1 type_1 2014-06-18 02:05:00 2 2014-06-17 01:53:00
5 1 type_1 2014-06-18 08:00:00 3 2014-06-17 01:53:00
6 1 type_2 2014-06-17 05:27:00 11 2014-06-17 01:53:00
7 1 type_2 2014-06-17 11:10:00 22 2014-06-17 01:53:00
8 1 type_2 2014-06-17 17:02:00 11 2014-06-17 01:53:00
9 1 type_2 2014-06-17 23:56:00 22 2014-06-17 01:53:00
10 1 type_2 2014-06-18 07:01:00 33 2014-06-17 01:53:00
11 2 type_1 2014-07-03 16:01:00 111 2014-07-03 14:35:00
12 2 type_1 2014-07-03 19:19:00 222 2014-07-03 14:35:00
13 2 type_1 2014-07-03 23:55:00 333 2014-07-03 14:35:00
14 2 type_1 2014-07-04 08:08:00 444 2014-07-03 14:35:00
15 2 type_1 2014-07-04 13:55:00 111 2014-07-03 14:35:00
16 2 type_2 2014-07-03 22:12:00 1111 2014-07-03 14:35:00
17 2 type_2 2014-07-04 08:59:00 2222 2014-07-03 14:35:00
18 2 type_2 2014-07-04 14:10:00 1111 2014-07-03 14:35:00
19 2 type_2 2014-07-04 17:00:00 2222 2014-07-03 14:35:00
20 2 type_2 2014-07-04 23:00:00 3333 2014-07-03 14:35:00
The Subjects with ID 1
and ID 2
enter at a specified entry_time
and thereafter, cumulative amount
s are measured at specific measurement_times
. However, each day at noon, the amount is set back to zero again and counting starts again (from zero). What I would like to achieve is that once a break at noon happens (and therefore a reset to zero), it keeps adding the new newly starting cumulative amount to that already accumulated before noon (grouped by the grouping variable measurement_type
).
For the noon breaks, the answer provided in the aforementioned link works perfectly:
library(dplyr)
data %>% as_tibble() %>%
# Check 12 hours passed --> `pm` column
mutate(pm = format(measurement_time, "%H") >= 12) %>%
mutate(date_fct = format(measurement_time, "%Y_%d")) %>%
# Group by ID and `pm`
group_by(ID, measurement_type, date_fct, pm) %>%
# Turn cumsum into actual values
mutate(amount_act = amount - lag(amount, default = 0)) %>%
# Cumsum over ID
ungroup() %>%
group_by(ID, measurement_type) %>%
mutate(amount_cums = cumsum(amount_act)) %>%
ungroup() %>%
select(-c(pm, date_fct, amount_act))
# A tibble: 20 x 6
ID measurement_type measurement_time amount entry_time amount_cums
<fct> <fct> <dttm> <dbl> <dttm> <dbl>
1 1 type_1 2014-06-17 04:00:00 1 2014-06-17 01:53:00 1
2 1 type_1 2014-06-17 11:52:00 2 2014-06-17 01:53:00 2
3 1 type_1 2014-06-17 18:58:00 1 2014-06-17 01:53:00 3
4 1 type_1 2014-06-18 02:05:00 2 2014-06-17 01:53:00 5
5 1 type_1 2014-06-18 08:00:00 3 2014-06-17 01:53:00 6
6 1 type_2 2014-06-17 05:27:00 11 2014-06-17 01:53:00 11
7 1 type_2 2014-06-17 11:10:00 22 2014-06-17 01:53:00 22
8 1 type_2 2014-06-17 17:02:00 11 2014-06-17 01:53:00 33
9 1 type_2 2014-06-17 23:56:00 22 2014-06-17 01:53:00 44
10 1 type_2 2014-06-18 07:01:00 33 2014-06-17 01:53:00 77
11 2 type_1 2014-07-03 16:01:00 111 2014-07-03 14:35:00 111
12 2 type_1 2014-07-03 19:19:00 222 2014-07-03 14:35:00 222
13 2 type_1 2014-07-03 23:55:00 333 2014-07-03 14:35:00 333
14 2 type_1 2014-07-04 08:08:00 444 2014-07-03 14:35:00 777
15 2 type_1 2014-07-04 13:55:00 111 2014-07-03 14:35:00 888
16 2 type_2 2014-07-03 22:12:00 1111 2014-07-03 14:35:00 1111
17 2 type_2 2014-07-04 08:59:00 2222 2014-07-03 14:35:00 3333
18 2 type_2 2014-07-04 14:10:00 1111 2014-07-03 14:35:00 4444
19 2 type_2 2014-07-04 17:00:00 2222 2014-07-03 14:35:00 5555
20 2 type_2 2014-07-04 23:00:00 3333 2014-07-03 14:35:00 6666
As you can see, afternoon-counts get correctly added to before-noon-counts. However, midnight breaks incorrectly add the values of the next day (cumulative since noon) to the cumulative amounts (amount_cums
) of the previous day, due to the grouping per day (date_fct
in the provided code).
Any help is greatly appreciated to get the desired output for the amount_cums
as follows:
# A tibble: 20 x 6
ID measurement_type measurement_time amount entry_time amount_cums
<fct> <fct> <dttm> <dbl> <dttm> <dbl>
1 type_1 2014-06-17 04:00:00 1 2014-06-17 01:53:00 1
1 type_1 2014-06-17 11:52:00 2 2014-06-17 01:53:00 2
1 type_1 2014-06-17 18:58:00 1 2014-06-17 01:53:00 3
1 type_1 2014-06-18 02:05:00 2 2014-06-17 01:53:00 4
1 type_1 2014-06-18 08:00:00 3 2014-06-17 01:53:00 5
1 type_2 2014-06-17 05:27:00 11 2014-06-17 01:53:00 11
1 type_2 2014-06-17 11:10:00 22 2014-06-17 01:53:00 22
1 type_2 2014-06-17 17:02:00 11 2014-06-17 01:53:00 33
1 type_2 2014-06-17 23:56:00 22 2014-06-17 01:53:00 44
1 type_2 2014-06-18 07:01:00 33 2014-06-17 01:53:00 55
2 type_1 2014-07-03 16:01:00 111 2014-07-03 14:35:00 111
2 type_1 2014-07-03 19:19:00 222 2014-07-03 14:35:00 222
2 type_1 2014-07-03 23:55:00 333 2014-07-03 14:35:00 333
2 type_1 2014-07-04 08:08:00 444 2014-07-03 14:35:00 444
2 type_1 2014-07-04 13:55:00 111 2014-07-03 14:35:00 555
2 type_2 2014-07-03 22:12:00 1111 2014-07-03 14:35:00 1111
2 type_2 2014-07-04 08:59:00 2222 2014-07-03 14:35:00 2222
2 type_2 2014-07-04 14:10:00 1111 2014-07-03 14:35:00 3333
2 type_2 2014-07-04 17:00:00 2222 2014-07-03 14:35:00 4444
2 type_2 2014-07-04 23:00:00 3333 2014-07-03 14:35:00 5555
Data
data <- structure(list(ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("1", "2"), class = "factor"),
measurement_type = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L), .Label = c("type_1", "type_2"), class = "factor"),
measurement_time = structure(c(1402970400, 1402998720, 1403024280, 1403049900, 1403071200, 1402975620, 1402996200, 1403017320, 1403042160, 1403067660,
1404396060, 1404407940, 1404424500, 1404454080, 1404474900, 1404418320, 1404457140, 1404475800, 1404486000, 1404507600), class = c("POSIXct", "POSIXt"), tzone = ""),
amount = c(1, 2, 1, 2, 3, 11, 22, 11, 22, 33, 111, 222, 333, 444, 111, 1111, 2222, 1111, 2222, 3333),
entry_time = structure(c(1402962780, 1402962780, 1402962780, 1402962780, 1402962780,1402962780, 1402962780, 1402962780, 1402962780, 1402962780,
1404390900, 1404390900, 1404390900, 1404390900, 1404390900, 1404390900, 1404390900, 1404390900, 1404390900, 1404390900),
class = c("POSIXct", "POSIXt"), tzone = "CET")), class = "data.frame", row.names = c(NA, -20L))
CodePudding user response:
Here's an approach where I identify the most recent noon, add a helper column which captures the last measurement per measuring day, and finally add the prior days' last measures to each value.
library(dplyr); library(lubridate)
data %>%
# arrange(ID, measurement_type, measurement_time) %>%
# I needed to adjust the times since they loaded in my local time
mutate(measurement_time = measurement_time dhours(9)) %>%
# identify the most recent noon
mutate(start_of_count_day = floor_date(measurement_time - dhours(12), "day") dhours(12)) %>%
group_by(ID, measurement_type, start_of_count_day) %>%
mutate(day_ttl = if_else(row_number() == max(row_number()), amount, 0)) %>%
group_by(ID, measurement_type) %>%
mutate(cuml = amount cumsum(lag(day_ttl, default = 0))) %>%
ungroup()
Result
# A tibble: 20 × 8
ID measurement_type measurement_time amount entry_time start_of_count_day day_ttl cuml
<fct> <fct> <dttm> <dbl> <dttm> <dttm> <dbl> <dbl>
1 1 type_1 2014-06-17 04:00:00 1 2014-06-16 16:53:00 2014-06-16 12:00:00 0 1
2 1 type_1 2014-06-17 11:52:00 2 2014-06-16 16:53:00 2014-06-16 12:00:00 2 2
3 1 type_1 2014-06-17 18:58:00 1 2014-06-16 16:53:00 2014-06-17 12:00:00 0 3
4 1 type_1 2014-06-18 02:05:00 2 2014-06-16 16:53:00 2014-06-17 12:00:00 0 4
5 1 type_1 2014-06-18 08:00:00 3 2014-06-16 16:53:00 2014-06-17 12:00:00 3 5
6 1 type_2 2014-06-17 05:27:00 11 2014-06-16 16:53:00 2014-06-16 12:00:00 0 11
7 1 type_2 2014-06-17 11:10:00 22 2014-06-16 16:53:00 2014-06-16 12:00:00 22 22
8 1 type_2 2014-06-17 17:02:00 11 2014-06-16 16:53:00 2014-06-17 12:00:00 0 33
9 1 type_2 2014-06-17 23:56:00 22 2014-06-16 16:53:00 2014-06-17 12:00:00 0 44
10 1 type_2 2014-06-18 07:01:00 33 2014-06-16 16:53:00 2014-06-17 12:00:00 33 55
11 2 type_1 2014-07-03 16:01:00 111 2014-07-03 05:35:00 2014-07-03 12:00:00 0 111
12 2 type_1 2014-07-03 19:19:00 222 2014-07-03 05:35:00 2014-07-03 12:00:00 0 222
13 2 type_1 2014-07-03 23:55:00 333 2014-07-03 05:35:00 2014-07-03 12:00:00 0 333
14 2 type_1 2014-07-04 08:08:00 444 2014-07-03 05:35:00 2014-07-03 12:00:00 444 444
15 2 type_1 2014-07-04 13:55:00 111 2014-07-03 05:35:00 2014-07-04 12:00:00 111 555
16 2 type_2 2014-07-03 22:12:00 1111 2014-07-03 05:35:00 2014-07-03 12:00:00 0 1111
17 2 type_2 2014-07-04 08:59:00 2222 2014-07-03 05:35:00 2014-07-03 12:00:00 2222 2222
18 2 type_2 2014-07-04 14:10:00 1111 2014-07-03 05:35:00 2014-07-04 12:00:00 0 3333
19 2 type_2 2014-07-04 17:00:00 2222 2014-07-03 05:35:00 2014-07-04 12:00:00 0 4444
20 2 type_2 2014-07-04 23:00:00 3333 2014-07-03 05:35:00 2014-07-04 12:00:00 3333 5555