Given a date frame:
set.seed(123)
data.frame("var1" = runif(10),
"indicator" = c(rep(1,2),rep(0,2), rep(1,2), rep(0,2), rep(1,2)))
var1 indicator
1 0.8895393 1
2 0.6928034 1
3 0.6405068 0
4 0.9942698 0
5 0.6557058 1
6 0.7085305 1
7 0.5440660 0
8 0.5941420 0
9 0.2891597 1
10 0.1471136 1
How can I make it so that the first group of ones in the "indicator" column are assigned "1", the second chunk of ones assigned "2" etc.?
The resulting dataframe should look like this:
var1 indicator new_col
1 0.96302423 1 1
2 0.90229905 1 1
3 0.69070528 0 0
4 0.79546742 0 0
5 0.02461368 1 2
6 0.47779597 1 2
7 0.75845954 0 0
8 0.21640794 0 0
9 0.31818101 1 3
10 0.23162579 1 3
Looking for a tidyverse solution.
CodePudding user response:
In base R
, it can be done with rle
dat$new_col <- inverse.rle(within.list(rle(dat$indicator),
{values[values == 1] <- seq_len(sum(values))}))
-output
> dat
var1 indicator new_col
1 0.2875775 1 1
2 0.7883051 1 1
3 0.4089769 0 0
4 0.8830174 0 0
5 0.9404673 1 2
6 0.0455565 1 2
7 0.5281055 0 0
8 0.8924190 0 0
9 0.5514350 1 3
10 0.4566147 1 3
Or using dplyr
library(dplyr)
library(data.table)
dat %>%
mutate(new_col = rleid(indicator) * indicator,
new_col = match(new_col, unique(new_col[new_col != 0]), nomatch = 0))
-output
var1 indicator new_col
1 0.2875775 1 1
2 0.7883051 1 1
3 0.4089769 0 0
4 0.8830174 0 0
5 0.9404673 1 2
6 0.0455565 1 2
7 0.5281055 0 0
8 0.8924190 0 0
9 0.5514350 1 3
10 0.4566147 1 3
Or with data.table
setDT(dat)[, new_col := fcoalesce(as.integer(factor(rleid(indicator) *
NA^!indicator)), 0L)]
CodePudding user response:
Using cumsum
:
df$v <- with(df, cumsum(indicator == 1 & dplyr::lag(indicator == 0, default = 1)))
df$v[df$indicator == 0] <- 0
var1 indicator v
1 0.2875775 1 1
2 0.7883051 1 1
3 0.4089769 0 0
4 0.8830174 0 0
5 0.9404673 1 2
6 0.0455565 1 2
7 0.5281055 0 0
8 0.8924190 0 0
9 0.5514350 1 3
10 0.4566147 1 3