I have a panel dataset where some groups have observations starting at an earlier year than others and would like to calculate the change in value from the earliest possible time period. I expected that by using case_when within mutate, R would not try to evaluate the code for groups where the earlier dates do not exist, but this does not seem to be the case. I have included a reprex below.
library("dplyr")
dataset <- data.frame(names=c("a","a","a","b","b"),
values=c(2,3,4,2,3),
dates=c("2010","2011","2012","2011","2012"))
dataset_calc <- dataset %>%
group_by(names) %>%
mutate(new_val = case_when(names=="a" ~ values-values[dates=="2010"],
TRUE ~ values-values[dates=="2011"]))
Is there a better solution for what I would like to do? The resulting dataframe should be something like:
names values dates new_val
1 a 2 2010 0
2 a 3 2011 1
3 a 4 2012 2
4 b 2 2011 0
5 b 3 2012 1
CodePudding user response:
If you arrage the data by group, then you can just subtract off the first value for each group
dataset %>%
group_by(names) %>%
arrange(dates) %>%
mutate(new_val = values - first(values))
If you wanted to hard code different reference years, you would want to use the case_when
part over the year rather than the values. For example
dataset %>%
group_by(names) %>%
mutate(
ref_year = case_when(names=="a" ~ "2010", TRUE~"2011"),
new_val = values - values[dates==ref_year],
ref_year = NULL
)
(you don't need to use the temporary ref_year
variable, I just added it here for clarity of how the function was working)