r - Group by two columns and create a new sequential number for values of one of the columns-CodePudding

I need to use r (dplyr) to group_by two variables (schoolid, grade) to create a third sequential numeric variable, that restarts from 1:n for each grade within each school. In other words, the new_id increments (1:n) for each contiguous block of unique grade values. It should look like the new_id column in the data frame below.

structure(list(schoolid = c(201L, 201L, 201L, 201L, 201L, 201L, 
201L, 201L, 201L, 201L, 201L, 201L, 201L, 202L, 202L, 202L, 202L, 
202L, 202L, 202L, 202L, 202L), grade = c(3L, 3L, 4L, 4L, 5L, 
5L, 3L, 3L, 5L, 5L, 5L, 3L, 3L, 3L, 3L, 4L, 4L, 5L, 5L, 3L, 4L, 
4L), new_id = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 
3L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L)), row.names = c(NA, 22L
), class = "data.frame")

I want to produce to following dataframe with the new_id column added but right now I just have schoolid and grade

data table I want to produce

I tried to use the following code but the output does not do what I am looking for as shown in the table above


data %>%
  group_by(schoolid, grade) %>%
  mutate(new_id = 1:n())

CodePudding user response：

If I'm understanding right, I think this should work. I use the data.table::rleid function to create a helper variable of contiguous values.

Next time please share sample data as copy/pasteable text so we don't have to transcribe from an image :)

sample_data = data.frame(
  schoolid = 201,
  grade = c(3,3,4,4,5,5,3,3,5,5,5,3,3)
)

library(dplyr)
sample_data %>%
  group_by(schoolid) %>%
  mutate(
    g = data.table::rleid(grade)
  ) %>%
  group_by(schoolid, grade) %>%
  mutate(
    new_id = dense_rank(g)
  ) %>%
  ungroup()
# # A tibble: 13 × 4
#    schoolid grade     g new_id
#       <dbl> <dbl> <int>  <int>
#  1      201     3     1      1
#  2      201     3     1      1
#  3      201     4     2      1
#  4      201     4     2      1
#  5      201     5     3      1
#  6      201     5     3      1
#  7      201     3     4      2
#  8      201     3     4      2
#  9      201     5     5      2
# 10      201     5     5      2
# 11      201     5     5      2
# 12      201     3     6      3
# 13      201     3     6      3