Home > OS >  Filling up orders based on eventdate per ID in R II
Filling up orders based on eventdate per ID in R II

Time:09-12

I had a similar question to this previously, and thanks to Ronak Shah, I could solve the problem. However, now I have run into a new problem.

I have the following table. I would like to make a new column called order so that the day when eventdate is not NA, the order of it is zero. And then, I would like to fill up negative consecutively decreasing integers to the previous rows and positive consecutive increasing integers to the below rows per ID. But then I only need the previous three rows to the event date and three rows after the event date, which makes seven rows per eventdate, since the seven rows include the eventdate itself.

Please note that there could be multiple eventdates per ID.

 ---- ------------ ------- 
| ID | eventdate  | Value |
 ---- ------------ ------- 
|  1 | NA         |    10 |
|  1 | NA         |    11 |
|  1 | NA         |    12 |
|  1 | NA         |    11 |
|  1 | 2011-03-18 |    15 |
|  1 | NA         |    17 |
|  1 | NA         |    18 |
|  1 | NA         |    15 |
|  1 | NA         |    20 |
|  1 | NA         |    21 |
|  1 | NA         |    22 |
|  1 | 2011-06-20 |    25 |
|  1 | NA         |    26 |
|  1 | NA         |    25 |
|  1 | NA         |    26 |
|  2 | NA         |     5 |
|  2 | NA         |     6 |
|  2 | NA         |     7 |
|  2 | 2011-05-28 |     9 |
|  2 | NA         |    10 |
|  2 | NA         |    11 |
|  2 | NA         |    15 |
|  2 | NA         |    16 |
|  3 | NA         |    20 |
|  3 | NA         |    22 |
|  3 | NA         |    23 |
|  3 | NA         |    24 |
|  3 | 2012-05-28 |    28 |
|  3 | NA         |    29 |
|  3 | NA         |    25 |
|  3 | NA         |    24 |
|  3 | NA         |    26 |
|  3 | NA         |    24 |
 ---- ------------ ------- 

Below is the desired output from the above table.

 ---- ------------ ------- ------- 
| ID | eventdate  | Value | order |
 ---- ------------ ------- ------- 
|  1 | NA         |    11 |    -3 |
|  1 | NA         |    12 |    -2 |
|  1 | NA         |    11 |    -1 |
|  1 | 2011-03-18 |    15 |     0 |
|  1 | NA         |    17 |     1 |
|  1 | NA         |    18 |     2 |
|  1 | NA         |    15 |     3 |
|  1 | NA         |    20 |    -3 |
|  1 | NA         |    21 |    -2 |
|  1 | NA         |    22 |    -1 |
|  1 | 2011-06-20 |    25 |     0 |
|  1 | NA         |    26 |     1 |
|  1 | NA         |    25 |     2 |
|  1 | NA         |    26 |     3 |
|  2 | NA         |     5 |    -3 |
|  2 | NA         |     6 |    -2 |
|  2 | NA         |     7 |    -1 |
|  2 | 2011-05-28 |     9 |     0 |
|  2 | NA         |    10 |     1 |
|  2 | NA         |    11 |     2 |
|  2 | NA         |    15 |     3 |
|  3 | NA         |    22 |    -3 |
|  3 | NA         |    23 |    -2 |
|  3 | NA         |    24 |    -1 |
|  3 | 2012-05-28 |    28 |     0 |
|  3 | NA         |    29 |     1 |
|  3 | NA         |    25 |     2 |
|  3 | NA         |    24 |     3 |
 ---- ------------ ------- ------- 

Thank you very much in advance!

CodePudding user response:

Here is one option where it considers multiple non-NA 'eventdate' per 'ID' - create a function that loops over the index of non-NA elements in 'eventdate' to return the sequence -3:3 for the row indexes

library(purrr)
library(dplyr)
f1 <- function(eventd) {
     tmp <- rep(NA_integer_, length(eventd))
      i1 <- which(complete.cases(eventd))
      d1 <- purrr::map_dfr(i1, ~ tibble(val = -3:3, ind = (.x   val) 
) %>% 
      filter(ind >=1 & ind <= length(eventd)) %>%
      distinct(ind, .keep_all = TRUE))
      replace(tmp, d1$ind, d1$val)

}
out <- df1 %>% 
  group_by(ID) %>%
  mutate(order = f1(eventdate)) %>%
  ungroup %>% 
  filter(complete.cases(order))

-output

as.data.frame(out)
  ID  eventdate Value order
1   1       <NA>    11    -3
2   1       <NA>    12    -2
3   1       <NA>    11    -1
4   1 2011-03-18    15     0
5   1       <NA>    17     1
6   1       <NA>    18     2
7   1       <NA>    15     3
8   1       <NA>    20    -3
9   1       <NA>    21    -2
10  1       <NA>    22    -1
11  1 2011-06-20    25     0
12  1       <NA>    26     1
13  1       <NA>    25     2
14  1       <NA>    26     3
15  2       <NA>     5    -3
16  2       <NA>     6    -2
17  2       <NA>     7    -1
18  2 2011-05-28     9     0
19  2       <NA>    10     1
20  2       <NA>    11     2
21  2       <NA>    15     3
22  3       <NA>    22    -3
23  3       <NA>    23    -2
24  3       <NA>    24    -1
25  3 2012-05-28    28     0
26  3       <NA>    29     1
27  3       <NA>    25     2
28  3       <NA>    24     3

data

df1 <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L), eventdate = structure(c(NA, NA, 
NA, NA, 15051L, NA, NA, NA, NA, NA, NA, 15145L, NA, NA, NA, NA, 
NA, NA, 15122L, NA, NA, NA, NA, NA, NA, NA, NA, 15488L, NA, NA, 
NA, NA, NA), class = c("IDate", "Date")), Value = c(10L, 11L, 
12L, 11L, 15L, 17L, 18L, 15L, 20L, 21L, 22L, 25L, 26L, 25L, 26L, 
5L, 6L, 7L, 9L, 10L, 11L, 15L, 16L, 20L, 22L, 23L, 24L, 28L, 
29L, 25L, 24L, 26L, 24L)), row.names = c(NA, -33L), class = "data.frame")
  • Related