R incrementing a variable in dplyr-CodePudding

I have the following grouped data frame:

library(dplyr)

# Create a sample dataframe
df <- data.frame(
  student = c("A", "A", "A","B","B", "B", "C", "C","C"),
  grade = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
  age= c(NA, 6, 6, 7, 7, 7, NA, NA, 9)
)

I want to update the age of each student so that it is one plus the age in the previous year, with their age in the first year they appear in the dataset remaining unchanged. For example, student A's age should be NA, 6, 7, student B's age should be 7,8,9, and student C's age should be NA, NA, 9.

CodePudding user response：

How about this:

library(dplyr)
df <- data.frame(
  student = c("A", "A", "A","B","B", "B", "C", "C","C"),
  grade = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
  age= c(NA, 6, 6, 7, 7, 7, NA, NA, 9)
)
df %>% 
  group_by(student) %>% 
  mutate(age = age   cumsum(!is.na(age))-1)
#> # A tibble: 9 × 3
#> # Groups:   student [3]
#>   student grade   age
#>   <chr>   <dbl> <dbl>
#> 1 A           1    NA
#> 2 A           2     6
#> 3 A           3     7
#> 4 B           1     7
#> 5 B           2     8
#> 6 B           3     9
#> 7 C           1    NA
#> 8 C           2    NA
#> 9 C           3     9

^{Created on 2022-12-30 by the reprex package (v2.0.1)}

CodePudding user response：

in data.table, assuming the order of the rows is the 'correct' order:

library(data.table)
setDT(df)[, new_age := age   rowid(age) - 1, by = .(student)]
#    student grade age new_age
# 1:       A     1  NA      NA
# 2:       A     2   6       6
# 3:       A     3   6       7
# 4:       B     1   7       7
# 5:       B     2   7       8
# 6:       B     3   7       9
# 7:       C     1  NA      NA
# 8:       C     2  NA      NA
# 9:       C     3   9       9