I have a dataset of ids, months and some metric x.
library(tidyverse)
library(lubridate)
df <- data.frame(
id = c("a1", "a1", "a1", "b1", "b1", "b1"),
month = c("2021-01-01", "2021-03-01", "2021-04-01", "2021-05-01", "2021-06-01", "2021-08-01"),
x = c(34,56,76, 12, 13, 14),
month_start = c("2021-01-01", "2021-01-01", "2021-01-01", "2021-05-01", "2021-05-01", "2021-05-01")
)
df <- df %>% mutate(month = as.Date(month), month_start = as.Date(month_start))
and would like to create new columns that are values of the 2nd and 4th months' x of each customer. I tried the code below but it failed because for id == "a1"
, the 2nd month's x is NA. (error: Input x_growth_start
can't be recycled to size 3.)
df %>%
group_by(id) %>%
mutate(x_growth_start = x[month == month_start %m % months(1)],
x_growth_end = x[month == month_start %m % months(3)])
I realize what it actually returns when the value doesn't exist is integer(0)
. Can I make the code run (say, by letting it return NA
if integer(0))
? I tried what was suggested here but it didn't work Input `typ` can't be recycled to size in R
Thanks in advance.
CodePudding user response:
You should use match()
. It returns NA
when no match is found.
df %>%
group_by(id) %>%
mutate(x_growth_start = x[match(month_start %m % months(1), month)],
x_growth_end = x[match(month_start %m % months(3), month)])
# id month x month_start x_growth_start x_growth_end
# <chr> <date> <dbl> <date> <dbl> <dbl>
# 1 a1 2021-01-01 34 2021-01-01 NA 76
# 2 a1 2021-03-01 56 2021-01-01 NA 76
# 3 a1 2021-04-01 76 2021-01-01 NA 76
# 4 b1 2021-05-01 12 2021-05-01 13 14
# 5 b1 2021-06-01 13 2021-05-01 13 14
# 6 b1 2021-08-01 14 2021-05-01 13 14