Home > Mobile >  Ignoring count reset at noon
Ignoring count reset at noon

Time:04-21

This is a follow-up question for this question.

To recapitulate: I have a dataframe data like this:

> data
   ID measurement_type    measurement_time amount          entry_time
1   1           type_1 2014-06-17 04:00:00      1 2014-06-17 01:53:00
2   1           type_1 2014-06-17 11:52:00      2 2014-06-17 01:53:00
3   1           type_1 2014-06-17 18:58:00      1 2014-06-17 01:53:00
4   1           type_1 2014-06-18 02:05:00      2 2014-06-17 01:53:00
5   1           type_1 2014-06-18 08:00:00      3 2014-06-17 01:53:00
6   1           type_2 2014-06-17 05:27:00     11 2014-06-17 01:53:00
7   1           type_2 2014-06-17 11:10:00     22 2014-06-17 01:53:00
8   1           type_2 2014-06-17 17:02:00     11 2014-06-17 01:53:00
9   1           type_2 2014-06-17 23:56:00     22 2014-06-17 01:53:00
10  1           type_2 2014-06-18 07:01:00     33 2014-06-17 01:53:00
11  2           type_1 2014-07-03 16:01:00    111 2014-07-03 14:35:00
12  2           type_1 2014-07-03 19:19:00    222 2014-07-03 14:35:00
13  2           type_1 2014-07-03 23:55:00    333 2014-07-03 14:35:00
14  2           type_1 2014-07-04 08:08:00    444 2014-07-03 14:35:00
15  2           type_1 2014-07-04 13:55:00    111 2014-07-03 14:35:00
16  2           type_2 2014-07-03 22:12:00   1111 2014-07-03 14:35:00
17  2           type_2 2014-07-04 08:59:00   2222 2014-07-03 14:35:00
18  2           type_2 2014-07-04 14:10:00   1111 2014-07-03 14:35:00
19  2           type_2 2014-07-04 17:00:00   2222 2014-07-03 14:35:00
20  2           type_2 2014-07-04 23:00:00   3333 2014-07-03 14:35:00

The Subjects with ID 1 and ID 2 enter at a specified entry_time and thereafter, cumulative amounts are measured at specific measurement_times. However, each day at noon, the amount is set back to zero again and counting starts again (from zero). What I would like to achieve is that once a break at noon happens (and therefore a reset to zero), it keeps adding the new newly starting cumulative amount to that already accumulated before noon (grouped by the grouping variable measurement_type).

For the noon breaks, the answer provided in the aforementioned link works perfectly:

library(dplyr)

data %>% as_tibble() %>%
  # Check 12 hours passed --> `pm` column
  mutate(pm = format(measurement_time, "%H") >= 12) %>%
  mutate(date_fct = format(measurement_time, "%Y_%d")) %>%
  # Group by ID and `pm`
  group_by(ID, measurement_type, date_fct, pm) %>%
  # Turn cumsum into actual values
  mutate(amount_act = amount - lag(amount, default = 0)) %>%
  # Cumsum over ID
  ungroup() %>%
  group_by(ID, measurement_type) %>%
  mutate(amount_cums = cumsum(amount_act)) %>%
  ungroup() %>%
  select(-c(pm, date_fct, amount_act))

# A tibble: 20 x 6
   ID    measurement_type measurement_time    amount entry_time          amount_cums
   <fct> <fct>            <dttm>               <dbl> <dttm>                    <dbl>
 1 1     type_1           2014-06-17 04:00:00      1 2014-06-17 01:53:00           1
 2 1     type_1           2014-06-17 11:52:00      2 2014-06-17 01:53:00           2
 3 1     type_1           2014-06-17 18:58:00      1 2014-06-17 01:53:00           3
 4 1     type_1           2014-06-18 02:05:00      2 2014-06-17 01:53:00           5
 5 1     type_1           2014-06-18 08:00:00      3 2014-06-17 01:53:00           6
 6 1     type_2           2014-06-17 05:27:00     11 2014-06-17 01:53:00          11
 7 1     type_2           2014-06-17 11:10:00     22 2014-06-17 01:53:00          22
 8 1     type_2           2014-06-17 17:02:00     11 2014-06-17 01:53:00          33
 9 1     type_2           2014-06-17 23:56:00     22 2014-06-17 01:53:00          44
10 1     type_2           2014-06-18 07:01:00     33 2014-06-17 01:53:00          77
11 2     type_1           2014-07-03 16:01:00    111 2014-07-03 14:35:00         111
12 2     type_1           2014-07-03 19:19:00    222 2014-07-03 14:35:00         222
13 2     type_1           2014-07-03 23:55:00    333 2014-07-03 14:35:00         333
14 2     type_1           2014-07-04 08:08:00    444 2014-07-03 14:35:00         777
15 2     type_1           2014-07-04 13:55:00    111 2014-07-03 14:35:00         888
16 2     type_2           2014-07-03 22:12:00   1111 2014-07-03 14:35:00        1111
17 2     type_2           2014-07-04 08:59:00   2222 2014-07-03 14:35:00        3333
18 2     type_2           2014-07-04 14:10:00   1111 2014-07-03 14:35:00        4444
19 2     type_2           2014-07-04 17:00:00   2222 2014-07-03 14:35:00        5555
20 2     type_2           2014-07-04 23:00:00   3333 2014-07-03 14:35:00        6666

As you can see, afternoon-counts get correctly added to before-noon-counts. However, midnight breaks incorrectly add the values of the next day (cumulative since noon) to the cumulative amounts (amount_cums) of the previous day, due to the grouping per day (date_fct in the provided code).

Any help is greatly appreciated to get the desired output for the amount_cums as follows:

# A tibble: 20 x 6
ID    measurement_type measurement_time    amount entry_time          amount_cums
<fct> <fct>            <dttm>               <dbl> <dttm>                    <dbl>
1     type_1           2014-06-17 04:00:00      1 2014-06-17 01:53:00           1
1     type_1           2014-06-17 11:52:00      2 2014-06-17 01:53:00           2
1     type_1           2014-06-17 18:58:00      1 2014-06-17 01:53:00           3
1     type_1           2014-06-18 02:05:00      2 2014-06-17 01:53:00           4
1     type_1           2014-06-18 08:00:00      3 2014-06-17 01:53:00           5
1     type_2           2014-06-17 05:27:00     11 2014-06-17 01:53:00          11
1     type_2           2014-06-17 11:10:00     22 2014-06-17 01:53:00          22
1     type_2           2014-06-17 17:02:00     11 2014-06-17 01:53:00          33
1     type_2           2014-06-17 23:56:00     22 2014-06-17 01:53:00          44
1     type_2           2014-06-18 07:01:00     33 2014-06-17 01:53:00          55
2     type_1           2014-07-03 16:01:00    111 2014-07-03 14:35:00         111
2     type_1           2014-07-03 19:19:00    222 2014-07-03 14:35:00         222
2     type_1           2014-07-03 23:55:00    333 2014-07-03 14:35:00         333
2     type_1           2014-07-04 08:08:00    444 2014-07-03 14:35:00         444
2     type_1           2014-07-04 13:55:00    111 2014-07-03 14:35:00         555
2     type_2           2014-07-03 22:12:00   1111 2014-07-03 14:35:00        1111
2     type_2           2014-07-04 08:59:00   2222 2014-07-03 14:35:00        2222
2     type_2           2014-07-04 14:10:00   1111 2014-07-03 14:35:00        3333
2     type_2           2014-07-04 17:00:00   2222 2014-07-03 14:35:00        4444
2     type_2           2014-07-04 23:00:00   3333 2014-07-03 14:35:00        5555

Data

data <- structure(list(ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("1", "2"), class = "factor"), 
measurement_type = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L), .Label = c("type_1", "type_2"), class = "factor"), 
measurement_time = structure(c(1402970400, 1402998720, 1403024280, 1403049900, 1403071200,  1402975620, 1402996200, 1403017320, 1403042160, 1403067660, 
1404396060, 1404407940, 1404424500, 1404454080, 1404474900,  1404418320, 1404457140, 1404475800, 1404486000, 1404507600), class = c("POSIXct", "POSIXt"), tzone = ""), 
amount = c(1, 2, 1, 2, 3, 11, 22, 11, 22, 33, 111, 222, 333, 444, 111, 1111, 2222, 1111, 2222, 3333), 
entry_time = structure(c(1402962780, 1402962780, 1402962780, 1402962780, 1402962780,1402962780, 1402962780, 1402962780, 1402962780, 1402962780, 
1404390900, 1404390900, 1404390900, 1404390900, 1404390900, 1404390900, 1404390900, 1404390900, 1404390900, 1404390900), 
class = c("POSIXct", "POSIXt"), tzone = "CET")), class = "data.frame", row.names = c(NA, -20L))

CodePudding user response:

Here's an approach where I identify the most recent noon, add a helper column which captures the last measurement per measuring day, and finally add the prior days' last measures to each value.

library(dplyr); library(lubridate)
data %>%
  # arrange(ID, measurement_type, measurement_time) %>%

  # I needed to adjust the times since they loaded in my local time
  mutate(measurement_time = measurement_time   dhours(9)) %>% 
  
  # identify the most recent noon
  mutate(start_of_count_day = floor_date(measurement_time - dhours(12), "day")   dhours(12)) %>%
  group_by(ID, measurement_type, start_of_count_day) %>%
  mutate(day_ttl = if_else(row_number() == max(row_number()), amount, 0)) %>%
  group_by(ID, measurement_type) %>%
  mutate(cuml = amount   cumsum(lag(day_ttl, default = 0))) %>%
  ungroup()

Result

# A tibble: 20 × 8
   ID    measurement_type measurement_time    amount entry_time          start_of_count_day  day_ttl  cuml
   <fct> <fct>            <dttm>               <dbl> <dttm>              <dttm>                <dbl> <dbl>
 1 1     type_1           2014-06-17 04:00:00      1 2014-06-16 16:53:00 2014-06-16 12:00:00       0     1
 2 1     type_1           2014-06-17 11:52:00      2 2014-06-16 16:53:00 2014-06-16 12:00:00       2     2
 3 1     type_1           2014-06-17 18:58:00      1 2014-06-16 16:53:00 2014-06-17 12:00:00       0     3
 4 1     type_1           2014-06-18 02:05:00      2 2014-06-16 16:53:00 2014-06-17 12:00:00       0     4
 5 1     type_1           2014-06-18 08:00:00      3 2014-06-16 16:53:00 2014-06-17 12:00:00       3     5
 6 1     type_2           2014-06-17 05:27:00     11 2014-06-16 16:53:00 2014-06-16 12:00:00       0    11
 7 1     type_2           2014-06-17 11:10:00     22 2014-06-16 16:53:00 2014-06-16 12:00:00      22    22
 8 1     type_2           2014-06-17 17:02:00     11 2014-06-16 16:53:00 2014-06-17 12:00:00       0    33
 9 1     type_2           2014-06-17 23:56:00     22 2014-06-16 16:53:00 2014-06-17 12:00:00       0    44
10 1     type_2           2014-06-18 07:01:00     33 2014-06-16 16:53:00 2014-06-17 12:00:00      33    55
11 2     type_1           2014-07-03 16:01:00    111 2014-07-03 05:35:00 2014-07-03 12:00:00       0   111
12 2     type_1           2014-07-03 19:19:00    222 2014-07-03 05:35:00 2014-07-03 12:00:00       0   222
13 2     type_1           2014-07-03 23:55:00    333 2014-07-03 05:35:00 2014-07-03 12:00:00       0   333
14 2     type_1           2014-07-04 08:08:00    444 2014-07-03 05:35:00 2014-07-03 12:00:00     444   444
15 2     type_1           2014-07-04 13:55:00    111 2014-07-03 05:35:00 2014-07-04 12:00:00     111   555
16 2     type_2           2014-07-03 22:12:00   1111 2014-07-03 05:35:00 2014-07-03 12:00:00       0  1111
17 2     type_2           2014-07-04 08:59:00   2222 2014-07-03 05:35:00 2014-07-03 12:00:00    2222  2222
18 2     type_2           2014-07-04 14:10:00   1111 2014-07-03 05:35:00 2014-07-04 12:00:00       0  3333
19 2     type_2           2014-07-04 17:00:00   2222 2014-07-03 05:35:00 2014-07-04 12:00:00       0  4444
20 2     type_2           2014-07-04 23:00:00   3333 2014-07-03 05:35:00 2014-07-04 12:00:00    3333  5555
  • Related