Home > front end >  assigning sequentially increasing group number to chunks of data with same indicator value in R
assigning sequentially increasing group number to chunks of data with same indicator value in R

Time:05-11

Given a date frame:

set.seed(123)
data.frame("var1" = runif(10),
           "indicator" = c(rep(1,2),rep(0,2), rep(1,2), rep(0,2), rep(1,2)))
       
 var1           indicator
1  0.8895393         1
2  0.6928034         1
3  0.6405068         0
4  0.9942698         0
5  0.6557058         1
6  0.7085305         1
7  0.5440660         0
8  0.5941420         0
9  0.2891597         1
10 0.1471136         1

How can I make it so that the first group of ones in the "indicator" column are assigned "1", the second chunk of ones assigned "2" etc.?

The resulting dataframe should look like this:

     var1        indicator  new_col
1  0.96302423         1       1
2  0.90229905         1       1
3  0.69070528         0       0
4  0.79546742         0       0
5  0.02461368         1       2
6  0.47779597         1       2
7  0.75845954         0       0
8  0.21640794         0       0
9  0.31818101         1       3
10 0.23162579         1       3

Looking for a tidyverse solution.

CodePudding user response:

In base R, it can be done with rle

dat$new_col <- inverse.rle(within.list(rle(dat$indicator), 
     {values[values == 1] <- seq_len(sum(values))}))

-output

> dat
        var1 indicator new_col
1  0.2875775         1       1
2  0.7883051         1       1
3  0.4089769         0       0
4  0.8830174         0       0
5  0.9404673         1       2
6  0.0455565         1       2
7  0.5281055         0       0
8  0.8924190         0       0
9  0.5514350         1       3
10 0.4566147         1       3

Or using dplyr

library(dplyr)
library(data.table)
dat %>%
   mutate(new_col = rleid(indicator) * indicator,
   new_col = match(new_col, unique(new_col[new_col != 0]), nomatch = 0))

-output

        var1 indicator new_col
1  0.2875775         1       1
2  0.7883051         1       1
3  0.4089769         0       0
4  0.8830174         0       0
5  0.9404673         1       2
6  0.0455565         1       2
7  0.5281055         0       0
8  0.8924190         0       0
9  0.5514350         1       3
10 0.4566147         1       3

Or with data.table

setDT(dat)[, new_col := fcoalesce(as.integer(factor(rleid(indicator) * 
            NA^!indicator)), 0L)]

CodePudding user response:

Using cumsum:

df$v <- with(df, cumsum(indicator == 1 & dplyr::lag(indicator == 0, default = 1)))
df$v[df$indicator == 0] <- 0

        var1 indicator v
1  0.2875775         1 1
2  0.7883051         1 1
3  0.4089769         0 0
4  0.8830174         0 0
5  0.9404673         1 2
6  0.0455565         1 2
7  0.5281055         0 0
8  0.8924190         0 0
9  0.5514350         1 3
10 0.4566147         1 3
  • Related