I have an unsorted vector of length N. Each element of the vector appears is present precisely twice (the vector length is an even number). I have a custom sorting algorithm, and the goal is to iterate until the vector achieves a state in which each element is adjacent to its copy.
Unsorted vector = {A,F,J,E,F,A,J,E}
A valid sorted state = {A,A,J,J,E,E,F,F}
Another valid sorted state = {J,J,A,A,F,F,E,E}
So my question lies in what is the fastest way to check if a sorted state is valid so that I can speed up my iterations? For long vectors, this will dictate most of my scaling ability.
CodePudding user response:
Something quick and dirty but I'm not sure it will always work:
all(duplicated(x) == c(FALSE,TRUE))
This is relying on the fact that the two same values will always be next to each other, one not-duplicated, the next duplicated. Seems to work with the test sets:
x <- c("A", "F", "J", "E", "F", "A", "J", "E")
s1 <- c("A", "A", "J", "J", "E", "E", "F", "F")
s2 <- c("J", "J", "A", "A", "F", "F", "E", "E")
all(duplicated(x) == c(FALSE,TRUE))
#[1] FALSE
all(duplicated(s1) == c(FALSE,TRUE))
#[1] TRUE
all(duplicated(s2) == c(FALSE,TRUE))
#[1] TRUE
And is pretty quick, looking through a million length vector in 5 hundredths of a second on my machine:
x <- rep(1:1e6, each=2)
system.time(all(duplicated(x) == c(FALSE,TRUE)))
# user system elapsed
# 0.04 0.00 0.05
CodePudding user response:
Another option is to use rle
function:
v1 <- c("A", "F", "J", "E", "F", "A", "J", "E")
v2 <- c("A", "A", "J", "J", "E", "E", "F", "F")
v3 <- c("J", "J", "A", "A", "F", "F", "E", "E")
rle_obj <- rle(v3)
all(rle_obj$lengths > 1)
test:
> rle_obj <- rle(v1)
> all(rle_obj$lengths > 1)
[1] FALSE
> rle_obj <- rle(v2)
> all(rle_obj$lengths > 1)
[1] TRUE
> rle_obj <- rle(v3)
> all(rle_obj$lengths > 1)
[1] TRUE
>
CodePudding user response:
An option involves converting the vector
(as the length
is even
and an element is present exactly twice) to a two-row matrix, get the unique
and test whether the number of rows is 1. If values duplicated are adjacent, while adding the dim
attributes with matrix
, the second row will be exactly the same as the first
f1 <- function(x)
{
nrow(unique(matrix(x, nrow = 2))) == 1
}
-testing
> v1 <- c("A", "F", "J", "E", "F", "A", "J", "E")
> v2 <- c("A", "A", "J", "J", "E", "E", "F", "F")
> v3 <- c("J", "J", "A", "A", "F", "F", "E", "E")
> f1(v1)
[1] FALSE
> f1(v2)
[1] TRUE
> f1(v3)
[1] TRUE
Or slightly more faster
f2 <- function(x)
{
sum(duplicated(matrix(x, nrow = 2))) == 1
}
-testing
> f2(v1)
[1] FALSE
> f2(v2)
[1] TRUE
> f2(v3)
[1] TRUE
-benchmarks
#thelatemail
> f3 <- function(x) all(duplicated(x) == c(FALSE,TRUE))
#TarJae
> f4 <- function(x) {rle_obj <- rle(x); all(rle_obj$lengths > 1)}
> x1 <- rep(1:1e8, each = 2)
> system.time(f1(x1))
user system elapsed
2.649 0.456 3.111
> system.time(f2(x1))
user system elapsed
2.258 0.433 2.694
> system.time(f3(x1))
user system elapsed
9.972 1.272 11.233
> system.time(f4(x1))
user system elapsed
7.051 3.281 10.333