I have a sequence of 0s and 1s in this manner:
xx <- c(1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1)
I want to make a vector that accumulates the streak of the zeroes and adds the accumulated streak to the next possible value 1. The result in this specific vector should be:
yy <- c(1, 1, 1, 0, 0, 3, 0, 2, 0, 0, 0, 4)
What is the fastest and most efficient way to do this in R?
CodePudding user response:
This base R implementation may not be the most efficient implementation so it would be interesting to compare performance if others come up with answers.
code
idx_add <- which(xx == 1 & c(NA, xx[-length(xx)]) == 0)
xx_rle <- rle(xx)
n_add <- xx_rle$lengths[xx_rle$values == 0]
yy <- xx
yy[idx_add] <- yy[idx_add] n_add
explanation
idx_add <- which(xx == 1 & c(NA, xx[-length(xx)]) == 0)
This line finds the indexes in xx
that we will add to. Those are the places where we have a 1
preceded by at least one 0
. So we get c(6, 8, 12)
.
xx_rle <- rle(xx)
Here we use the rle()
(run-length encoding) function to get the length of all the streaks of consecutive values in the vector xx
. xx_rle
has two elements, lengths
, the lengths of the streaks; and values
, their values (1
s and 0
s).
n_add <- xx_rle$lengths[xx_rle$values == 0]
Here we extract the streak lengths for only the streaks of zeroes.
yy <- xx
yy[idx_add] <- yy[idx_add] n_add
Now create a copy of xx
and add the zero streak lengths to the first 1
following the streak. This gives your desired result!
CodePudding user response:
One base R
solution could be:
with(rle(xx), rep(values c(0, head(lengths * (values == 0), -1)), lengths))
[1] 1 1 1 0 0 3 0 2 0 0 0 4
CodePudding user response:
Using dplyr
:
Data:
xx <- c(1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1)
Code:
yy <- as.data.frame(xx) %>%
mutate(group = ifelse(xx != 0, 1, 0),
group = cumsum(group) 1,
group = ifelse(xx != 0, 0, group)) %>%
group_by(group) %>%
mutate(group = n() 1) %>%
ungroup() %>%
mutate(yy = ifelse(xx != 0 & lag(xx) == 0, lag(group), xx),
yy = ifelse(is.na(yy),xx,yy)) %>%
select(yy) %>%
pull()
Output:
[1] 1 1 1 0 0 3 0 2 0 0 0 4