Home > Blockchain >  How to calculate percent change in R when there are some years of data missing?
How to calculate percent change in R when there are some years of data missing?

Time:01-14

I'm calculating the percent change of enrollment from academic year to academic year, but there are some academic years missing data, so I don't want it to calculate the change in those instances and keep it as blank instead of calculating a two year difference. I have multiple years, schools, and groups I am doing this by. Example data frame below and the code I am using currently. So I am missing 2016-17 in this example and don't want to calculate it for 17-18 then.

 School      Academic Year  Group   Enrollment pct_change  
 1 School 1  2018-19        Overall 450        ANSWER 
 2 School 1  2017-18        Overall 630        NA
 3 School 1  2015-16        Overall 635        ANSWER
 4 School 1  2014-15        Overall 750        ANSWER
 5 School 1  2013-14        Overall 704        ANSWER 

data <- data %>%
  group_by(School, Group) %>% 
  mutate(pct_change = (((Enrollment-lead(Enrollment, order_by = `Academic Year`))/Enrollment)) * 100) %>%
  ungroup()

CodePudding user response:

An option may be to expand the data for complete year

library(dplyr)
library(tidyr)
data %>%
    separate(`Academic Year`, into = c("Year", "Day"), 
remove = FALSE, convert = TRUE) %>%
    group_by(School, Group) %>%
    complete(Year = full_seq(Year, period = 1)) %>%
     mutate(pct_change = (((Enrollment-lead(Enrollment, 
    order_by = Year))/Enrollment)) * 100) %>%
  ungroup()
    filter(complete.cases(Enrollment)) %>%
    select(-Year, -Day)
  • Related