Home > Blockchain >  Compare number in group in given year to different group in previous year
Compare number in group in given year to different group in previous year

Time:10-18

I have a data frame in the following form (grouped by Year, Subject and Level). There are two „Levels“ and around 25 Subjects (the number varies per year) over 30 years.

Code to reproduce:

structure(list(Year = c(2005L, 2005L, 2005L, 2005L, 2005L, 2005L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2008L, 2008L), Subject = c("Englisch", 
"Englisch", "GK", "GK", "Mathematik", "Mathematik", "Englisch", "Englisch", "GK", "GK", "Mathematik", "Mathematik", "Englisch", "Englisch", "GK", "GK", "Mathematik", "Mathematik", "Englisch", 
"Englisch"), Level = c(20L, 21L, 20L, 21L, 20L, 21L, 20L, 21L, 20L, 21L, 20L, 21L, 20L, 21L, 20L, 21L, 20L, 21L, 20L, 21L), Students = c(27104L, 24841L, 29945L, 2862L, 29907L, 27802L, 28837L, 26357L, 31815L, 2880L, 31721L, 29115L, 30324L, 28064L, 32924L, 3508L, 32862L, 30862L, 32124L, 29624L)), row.names = c(NA, -20L), groups = structure(list(Year = c(2005L, 2005L, 2005L, 2006L, 2006L, 2006L, 2007L, 2007L, 2007L, 2008L), Subject = c("Englisch", "GK", "Mathematik", "Englisch", "GK", "Mathematik", "Englisch", "GK", "Mathematik", "Englisch"), .rows = structure(list(1:2,     3:4, 5:6, 7:8, 9:10, 11:12, 13:14, 15:16, 17:18, 19:20), ptype = integer(0), class = c("vctrs_list_of", 
"vctrs_vctr", "list"))), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", "tbl_df", "tbl", "data.frame")) -> rep
Year Subject Students Level
2000 English 100 1
2000 Maths 50 1
2000 ... ... ...
2001 Maths 30 2
2001 English 70 2

After each year, students usually ascend one level. To find out how many students did not ascend a level I need to calculate Level 1 Students of English in 2000 - Level 2 Students of English in 2001

The new data frame should thus look like this:

Year Subject Students Level Difference
2000 English 100 1 NA
2000 Maths 50 1 NA
2000 ... ... ... ...
2001 Maths 30 2 20
2001 English 70 2 30

I have tried dplyr:lag() but I don't know how to make it work with several groups.

CodePudding user response:

You should be able to get that with lag after arrange and group_by -

library(dplyr)

df %>%
  arrange(Subject, Year, Level) %>%
  group_by(Subject) %>%
  mutate(Difference = lag(Students) - Students)
  #You can also use diff
  #mutate(Difference = c(NA, -diff(Students)))

CodePudding user response:

We may do this with data.table

library(data.table)
setDT(df)[order(Subject, Year, Level)][, 
   Difference := shift(Students) - Students), by = Subject]
  •  Tags:  
  • r
  • Related