R: How to count length of intervals between specific word/symbol in a vector?-CodePudding

I have a vector that contains series of texts and numbers, like:

t <- c("A", 1:3, "A", 1:4, "A", 1:3)
t
#>  [1] "A" "1" "2" "3" "A" "1" "2" "3" "4" "A" "1" "2" "3"

^{Created on 2022-08-06 by the reprex package (v2.0.1)}

That is, the actual data is taken from a pdf, with the data frame collapsed into a single column vector, and the wrap length is uneven for some reason (probably because of the cell merging). To process this data efficiently, I want to know the length from "A" to next "A" or end. In this example the answer would be 3, 4, 3 (Edit: sorry for a simple mistake, it would be 4, 5, 4). I have tried many different methods but can't find one that works. Does anyone know of a better way?

CodePudding user response：

An alternative using rle (run-length encoding)

with(rle(t == "A"), subset(lengths, !values))
#> [1] 3 4 3

CodePudding user response：

You want the number of elements

(1) between adjacent "A"s;
(2) from the last "A" (excluding it) to the end.

We can use either of the following:

diff(c(which(t == "A"), length(t)   1)) - 1
#[1] 3 4 3

diff(which(c(t, "A") == "A")) - 1
#[1] 3 4 3

Essentially we pad an "A" at the end to turn (2) into (1). If the last element of t happens to be an "A", the last value in the result will be 0.

Extension:

If you further want to know the number of elements from the beginning to the first "A" (excluding it), we can pad a leading "A":

diff(c(0, which(t == "A"), length(t)   1)) - 1
#[1] 0 3 4 3

diff(which(c("A", t, "A") == "A")) - 1
#[1] 0 3 4 3

Here, the first value is 0, because the first element of t happens to be an "A".