I would like to ask a simple question about filling up orders based on eventdate per ID.
I have the following table. I would like to make a new column called order so that the day when eventdate is not NA, the order of it is zero. And then, I would like to fill up negative consecutively decreasing integers to the previous rows and positive consecutive increasing integers to the below rows per ID.
---- ------------ -------
| ID | eventdate | Value |
---- ------------ -------
| 1 | NA | 10 |
| 1 | NA | 11 |
| 1 | NA | 12 |
| 1 | NA | 11 |
| 1 | 2011-03-18 | 15 |
| 1 | NA | 17 |
| 1 | NA | 18 |
| 1 | NA | 15 |
| 2 | NA | 5 |
| 2 | NA | 6 |
| 2 | NA | 7 |
| 2 | 2011-05-28 | 9 |
| 2 | NA | 10 |
| 2 | NA | 11 |
| 2 | NA | 15 |
| 2 | NA | 16 |
| 3 | NA | 20 |
| 3 | NA | 22 |
| 3 | NA | 23 |
| 3 | NA | 24 |
| 3 | 2012-05-28 | 28 |
| 3 | NA | 29 |
| 3 | NA | 25 |
| 3 | NA | 24 |
| 3 | NA | 26 |
| 3 | NA | 24 |
---- ------------ -------
In short, I would like to make the following table
---- ------------ ------- -------
| ID | eventdate | Value | order |
---- ------------ ------- -------
| 1 | NA | 10 | -4 |
| 1 | NA | 11 | -3 |
| 1 | NA | 12 | -2 |
| 1 | NA | 11 | -1 |
| 1 | 2011-03-18 | 15 | 0 |
| 1 | NA | 17 | 1 |
| 1 | NA | 18 | 2 |
| 1 | NA | 15 | 3 |
| 2 | NA | 5 | -3 |
| 2 | NA | 6 | -2 |
| 2 | NA | 7 | -1 |
| 2 | 2011-05-28 | 9 | 0 |
| 2 | NA | 10 | 1 |
| 2 | NA | 11 | 2 |
| 2 | NA | 15 | 3 |
| 2 | NA | 16 | 4 |
| 3 | NA | 20 | -4 |
| 3 | NA | 22 | -3 |
| 3 | NA | 23 | -2 |
| 3 | NA | 24 | -1 |
| 3 | 2012-05-28 | 28 | 0 |
| 3 | NA | 29 | 1 |
| 3 | NA | 25 | 2 |
| 3 | NA | 24 | 3 |
| 3 | NA | 26 | 4 |
| 3 | NA | 24 | 5 |
---- ------------ ------- -------
Thank you very much in advance!
CodePudding user response:
You can find the position of non-NA value in eventdate
and subtract it with row position in that group.
library(dplyr)
df <- df %>%
group_by(ID) %>%
mutate(order = row_number() - match(TRUE, !is.na(eventdate))) %>%
ungroup
df
# A tibble: 26 × 4
# ID eventdate Value order
# <int> <chr> <int> <int>
# 1 1 NA 10 -4
# 2 1 NA 11 -3
# 3 1 NA 12 -2
# 4 1 NA 11 -1
# 5 1 2011-03-18 15 0
# 6 1 NA 17 1
# 7 1 NA 18 2
# 8 1 NA 15 3
# 9 2 NA 5 -3
#10 2 NA 6 -2
# … with 16 more rows
In base R, the same thing can be written as -
df <- transform(df, order = ave(!is.na(eventdate), ID,
FUN = function(x) seq_along(x) - match(TRUE, x)))