how to get single elements in a vector with repeated observations-CodePudding

I am trying to get single non-consequently repeated observation from a vector in R, let's put as an example: v <- c(1,1,1,2,2,2,1,1,1,2,1,1,2,2,2,2,2,1,1,1) what I need is basically a function that gives this output c(1,2,1,2,1,2,1) I thought of a for loop for doing this, that should be something like:

uniq_v <- v[1]
for(i in c(1:length(v)-1)[c(1:length(v)-1) >0]){
     if (v[i]!=v[i 1]){
    uniq_v <- c(uniq_v, v[i 1])
}
 }

I am pretty sure that there is a better, simpler way, but I cannot figure it out. Thank you, Giuseppe

CodePudding user response：

How about this (using dplyr):

v[v!=lead(v)] %>% head(-1)

CodePudding user response：

library(dplyr)
v <- c(1,1,1,2,2,2,1,1,1,2,1,1,2,2,2,2,2,1,1,1)

These two solutions are equivalent. They have the problem that they ignore the last digit.

v[v != lead(v)] %>% head(-1)
#> [1] 1 2 1 2 1 2
v[v != v[c(2:length(v), NA)]] |> head(-1)
#> [1] 1 2 1 2 1 2

The reason is because the last comparison is 1 != NA which returns NA when we’d need TRUE. If we change it to this it works:

v[!mapply(identical, v, lead(v))]
#> [1] 1 2 1 2 1 2 1
v[!mapply(identical, v, v[c(2:length(v), NA)])]
#> [1] 1 2 1 2 1 2 1

The easiest and probably fastest solution, however, is rle(v)$values suggested by @Chris.

rle(v)$values
#> [1] 1 2 1 2 1 2 1

library(microbenchmark)
microbenchmark(
  v[!mapply(identical, v, lead(v))],
  v[!mapply(identical, v, v[c(2:length(v), NA)])],
  rle(v)$values
)
#> Unit: microseconds
#>                                             expr  min    lq   mean median   uq    max neval
#>                v[!mapply(identical, v, lead(v))] 63.0 65.95 68.308  67.70 69.2  115.5   100
#>  v[!mapply(identical, v, v[c(2:length(v), NA)])] 36.7 37.70 38.993  38.20 39.4   65.5   100
#>                                    rle(v)$values 11.1 12.80 14.472  14.45 15.8   32.4   100

^{Created on 2022-06-10 by the reprex package (v2.0.1)}