I want to calculate per id per row how many months there are before reaching the value 1 in a another colomn. I have a couple of thousands IDs en many months to work with.
id period sick
1 1-1-1990 0
1 1-2-1990 0
1 1-3-1990 1
1 1-4-1990 0
2 1-1-1990 0
2 1-2-1990 0
2 1-3-1990 1
2 1-4-1990 0
2 1-5-1990 0
and I would like to have an output like this
id period sick months_until
1 1-1-1990 0 2
1 1-2-1990 0 1
1 1-3-1990 1 0
1 1-4-1990 0 -1
2 1-1-1990 0 2
2 1-2-1990 0 1
2 1-3-1990 1 0
2 1-4-1990 0 -1
2 1-5-1990 0 -2
CodePudding user response:
A possible solution, based on dplyr
:
library(dplyr)
df %>%
group_by(id) %>%
mutate(months_until = (which(sick == 1) - row_number())) %>%
ungroup
#> # A tibble: 9 × 4
#> id period sick months_until
#> <int> <chr> <int> <int>
#> 1 1 1-1-1990 0 2
#> 2 1 1-2-1990 0 1
#> 3 1 1-3-1990 1 0
#> 4 1 1-4-1990 0 -1
#> 5 2 1-1-1990 0 2
#> 6 2 1-2-1990 0 1
#> 7 2 1-3-1990 1 0
#> 8 2 1-4-1990 0 -1
#> 9 2 1-5-1990 0 -2
CodePudding user response:
Using data.table:
setDT(d)[, months_until := month(as.Date(period, "%d-%m-%Y"))[ sick == 1 ] - seq_len(.N), by = id][]
# id period sick months_until
# 1: 1 1-1-1990 0 2
# 2: 1 1-2-1990 0 1
# 3: 1 1-3-1990 1 0
# 4: 1 1-4-1990 0 -1
# 5: 2 1-1-1990 0 2
# 6: 2 1-2-1990 0 1
# 7: 2 1-3-1990 1 0
# 8: 2 1-4-1990 0 -1
# 9: 2 1-5-1990 0 -2
CodePudding user response:
Here is another idea without using any libraries and utilising seq()
by creating a custom function, i.e.
f1 <- function(x){
i1 <- which(x == 1) - 1
i2 <- i1 - (length(x) - 1)
seq(i1, i2)
}
df$new <- with(df, ave(sick, id, FUN = f1))
> df
id period sick new
1 1 1-1-1990 0 2
2 1 1-2-1990 0 1
3 1 1-3-1990 1 0
4 1 1-4-1990 0 -1
5 2 1-1-1990 0 2
6 2 1-2-1990 0 1
7 2 1-3-1990 1 0
8 2 1-4-1990 0 -1
9 2 1-5-1990 0 -2