Home > OS >  Create runs of repeating values in dplyr
Create runs of repeating values in dplyr

Time:07-27

Example data:

example_data <-
  data.frame(value = c(1,3,4,6,7,8,4,6,9,0),
             group = c("Not applicable",
                       "Large group",
                       "Large group",
                       "Not applicable",
                       "Group of 1",
                       "Large group",
                       "Large group",
                       "Large group",
                       "Group of 1",
                       "Not applicable"))

I would like to assign group numbers, starting with 1, to groups (both "Large group" and "Group of 1"), and zeroes to "Not applicable" values, using dplyr.

There can be more than one Not applicable value in a row. Group of 1 alway contains one row. Large group may contain any number of rows.

Desired output:

   value          group group_number
1      1 Not applicable            0
2      3    Large group            1
3      4    Large group            1
4      6 Not applicable            0
5      7     Group of 1            2
6      8    Large group            3
7      4    Large group            3
8      6    Large group            3
9      9     Group of 1            4
10     0 Not applicable            0

I tried this solution from the answers to my previous question:

example_data %>%
  mutate(group_number = with(rle(group != "Not applicable"), 
                      rep(cumsum(values) * values, lengths)))

And got

   value          group group_number
1      1 Not applicable            0
2      3    Large group            1
3      4    Large group            1
4      6 Not applicable            0
5      7     Group of 1            2
6      8    Large group            2
7      4    Large group            2
8      6    Large group            2
9      9     Group of 1            2
10     0 Not applicable            0

I would like to get separate numbers for Large group and Group of 1.

CodePudding user response:

example_data %>%
  mutate(gr = data.table::rleid(group)* (group != 'Not applicable'),
         gr = dense_rank(gr) - 1) # or even gr = as.numeric(factor(gr)) - 1

   value          group gr
1      1 Not applicable  0
2      3    Large group  1
3      4    Large group  1
4      6 Not applicable  0
5      7     Group of 1  2
6      8    Large group  3
7      4    Large group  3
8      6    Large group  3
9      9     Group of 1  4
10     0 Not applicable  0
  • Related