I am trying to get single non-consequently repeated observation from a vector in R, let's put as an example:
v <- c(1,1,1,2,2,2,1,1,1,2,1,1,2,2,2,2,2,1,1,1)
what I need is basically a function that gives this output
c(1,2,1,2,1,2,1)
I thought of a for loop for doing this, that should be something like:
uniq_v <- v[1]
for(i in c(1:length(v)-1)[c(1:length(v)-1) >0]){
if (v[i]!=v[i 1]){
uniq_v <- c(uniq_v, v[i 1])
}
}
I am pretty sure that there is a better, simpler way, but I cannot figure it out. Thank you, Giuseppe
CodePudding user response:
How about this (using dplyr):
v[v!=lead(v)] %>% head(-1)
CodePudding user response:
library(dplyr)
v <- c(1,1,1,2,2,2,1,1,1,2,1,1,2,2,2,2,2,1,1,1)
These two solutions are equivalent. They have the problem that they ignore the last digit.
v[v != lead(v)] %>% head(-1)
#> [1] 1 2 1 2 1 2
v[v != v[c(2:length(v), NA)]] |> head(-1)
#> [1] 1 2 1 2 1 2
The reason is because the last comparison is 1 != NA
which returns NA
when we’d need TRUE
. If we change it to this it works:
v[!mapply(identical, v, lead(v))]
#> [1] 1 2 1 2 1 2 1
v[!mapply(identical, v, v[c(2:length(v), NA)])]
#> [1] 1 2 1 2 1 2 1
The easiest and probably fastest solution, however, is rle(v)$values
suggested by @Chris.
rle(v)$values
#> [1] 1 2 1 2 1 2 1
library(microbenchmark)
microbenchmark(
v[!mapply(identical, v, lead(v))],
v[!mapply(identical, v, v[c(2:length(v), NA)])],
rle(v)$values
)
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> v[!mapply(identical, v, lead(v))] 63.0 65.95 68.308 67.70 69.2 115.5 100
#> v[!mapply(identical, v, v[c(2:length(v), NA)])] 36.7 37.70 38.993 38.20 39.4 65.5 100
#> rle(v)$values 11.1 12.80 14.472 14.45 15.8 32.4 100
Created on 2022-06-10 by the reprex package (v2.0.1)