reset a ranking when a variable exceeds a value using dplyr-CodePudding

Suppose I have the following data:

df <- tibble(ID=c(1,2,3,4,5,6,7,8,9,10),
             ID2=c(1,1,1,1,2,2,2,3,4,4),
             VAR=c(25,10,120,60,85,90,20,40,60,150))

I want to add a new column with a ranking that would be reset either when the ID2 changes or when VAR is greater than 100. The desired result is:

# A tibble: 10 x 4
      ID   ID2   VAR  RANK
   <dbl> <dbl> <dbl> <dbl>
 1     1     1    25     1
 2     2     1    10     2
 3     3     1   120     1
 4     4     1    60     2
 5     5     2    85     1
 6     6     2    90     2
 7     7     2    20     3
 8     8     3    40     1
 9     9     4    60     1
10    10     4   150     1

I know how to add a new column with a ranking that would be reset only when the ID2 changes:

df %>%
  arrange(ID2) %>%
  group_by(ID2) %>%
  mutate(RANK = row_number())

... but treating both conditions at the same time is more difficult. How should I do using dplyr?

CodePudding user response：

You can group_by ID2 and cumsum(VAR > 100), i.e.:

library(dplyr)
df %>% 
  group_by(ID2, cumVAR = cumsum(VAR > 100)) %>% 
  mutate(RANK = row_number())

output

# A tibble: 10 x 5
# Groups:   ID2, cumVAR [6]
      ID   ID2   VAR cumVAR  RANK
   <dbl> <dbl> <dbl>  <int> <int>
 1     1     1    25      0     1
 2     2     1    10      0     2
 3     3     1   120      1     1
 4     4     1    60      1     2
 5     5     2    85      1     1
 6     6     2    90      1     2
 7     7     2    20      1     3
 8     8     3    40      1     1
 9     9     4    60      1     1
10    10     4   150      2     1

CodePudding user response：

rowid from data.table would be useful as well

library(dplyr)
library(data.table)
df %>%
   mutate(RANK = rowid(ID2, cumsum(VAR > 100)))

-output

# A tibble: 10 × 4
      ID   ID2   VAR  RANK
   <dbl> <dbl> <dbl> <int>
 1     1     1    25     1
 2     2     1    10     2
 3     3     1   120     1
 4     4     1    60     2
 5     5     2    85     1
 6     6     2    90     2
 7     7     2    20     3
 8     8     3    40     1
 9     9     4    60     1
10    10     4   150     1