The rle()
function returns a list with values and lengths. I have not found a way to subset the output to isolate the streaks of a particular value that does not involve calling rle()
twice, or saving the output into an object to later subset (an added step).
For instance, for runs of heads (1's) in a series of fair coin tosses:
s <- sample(c(0,1),100,T)
rle(s)
Run Length Encoding
lengths: int [1:55] 1 2 1 2 1 2 1 2 2 1 ...
values : num [1:55] 0 1 0 1 0 1 0 1 0 1 ...
# Double-call:
rle(s)[[1]][rle(s)[[2]]==1]
[1] 2 2 2 2 1 1 1 1 6 1 1 1 2 2 1 1 2 2 2 2 2 3 1 1 4 1 2
# Adding an intermediate step:
> r <- rle(s)
> r$lengths[r$values==1]
[1] 2 2 2 2 1 1 1 1 6 1 1 1 2 2 1 1 2 2 2 2 2 3 1 1 4 1 2
I see that a very easy way of getting the streak lengths just for 1
is to simply tweak the rle()
code (answer), but there may be an even simpler way.
CodePudding user response:
in Base R:
with(rle(s), lengths[values==1])
[1] 1 3 2 2 1 1 1 3 2 1 1 3 1 1 1 1 1 2 3 1 2 1 3 3 1 2 1 1 2
CodePudding user response:
For a sequence of outcomes s
and when interested solely the lengths of the streaks on outcome oc
:
sk = function(s,oc){
n = length(s)
y <- s[-1L] != s[-n]
i <- c(which(y), n)
diff(c(0L, i))[s[i]==oc]
}
So to get the lengths for 1
:
sk(s,1)
[1] 2 2 2 2 1 1 1 1 6 1 1 1 2 2 1 1 2 2 2 2 2 3 1 1 4 1 2
and likewise for 0
:
sk(s,0)
[1] 1 1 1 1 2 2 2 2 4 1 1 2 1 1 1 1 1 1 3 1 1 2 6 2 1 1 4 4