I need to use r (dplyr) to group_by two variables (schoolid, grade) to create a third sequential numeric variable, that restarts from 1:n for each grade within each school. In other words, the new_id increments (1:n) for each contiguous block of unique grade values. It should look like the new_id column in the data frame below.
structure(list(schoolid = c(201L, 201L, 201L, 201L, 201L, 201L,
201L, 201L, 201L, 201L, 201L, 201L, 201L, 202L, 202L, 202L, 202L,
202L, 202L, 202L, 202L, 202L), grade = c(3L, 3L, 4L, 4L, 5L,
5L, 3L, 3L, 5L, 5L, 5L, 3L, 3L, 3L, 3L, 4L, 4L, 5L, 5L, 3L, 4L,
4L), new_id = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L,
3L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L)), row.names = c(NA, 22L
), class = "data.frame")
I want to produce to following dataframe with the new_id column added but right now I just have schoolid and grade
OR
I tried to use the following code but the output does not do what I am looking for as shown in the table above
data %>%
group_by(schoolid, grade) %>%
mutate(new_id = 1:n())
CodePudding user response:
If I'm understanding right, I think this should work. I use the data.table::rleid
function to create a helper variable of contiguous values.
Next time please share sample data as copy/pasteable text so we don't have to transcribe from an image :)
sample_data = data.frame(
schoolid = 201,
grade = c(3,3,4,4,5,5,3,3,5,5,5,3,3)
)
library(dplyr)
sample_data %>%
group_by(schoolid) %>%
mutate(
g = data.table::rleid(grade)
) %>%
group_by(schoolid, grade) %>%
mutate(
new_id = dense_rank(g)
) %>%
ungroup()
# # A tibble: 13 × 4
# schoolid grade g new_id
# <dbl> <dbl> <int> <int>
# 1 201 3 1 1
# 2 201 3 1 1
# 3 201 4 2 1
# 4 201 4 2 1
# 5 201 5 3 1
# 6 201 5 3 1
# 7 201 3 4 2
# 8 201 3 4 2
# 9 201 5 5 2
# 10 201 5 5 2
# 11 201 5 5 2
# 12 201 3 6 3
# 13 201 3 6 3