I'm calculating the percent change of enrollment from academic year to academic year, but there are some academic years missing data, so I don't want it to calculate the change in those instances and keep it as blank instead of calculating a two year difference. I have multiple years, schools, and groups I am doing this by. Example data frame below and the code I am using currently. So I am missing 2016-17 in this example and don't want to calculate it for 17-18 then.
School Academic Year Group Enrollment pct_change
1 School 1 2018-19 Overall 450 ANSWER
2 School 1 2017-18 Overall 630 NA
3 School 1 2015-16 Overall 635 ANSWER
4 School 1 2014-15 Overall 750 ANSWER
5 School 1 2013-14 Overall 704 ANSWER
data <- data %>%
group_by(School, Group) %>%
mutate(pct_change = (((Enrollment-lead(Enrollment, order_by = `Academic Year`))/Enrollment)) * 100) %>%
ungroup()
CodePudding user response:
An option may be to expand the data for complete
year
library(dplyr)
library(tidyr)
data %>%
separate(`Academic Year`, into = c("Year", "Day"),
remove = FALSE, convert = TRUE) %>%
group_by(School, Group) %>%
complete(Year = full_seq(Year, period = 1)) %>%
mutate(pct_change = (((Enrollment-lead(Enrollment,
order_by = Year))/Enrollment)) * 100) %>%
ungroup()
filter(complete.cases(Enrollment)) %>%
select(-Year, -Day)