Home > database >  Fastest validation of sorting vector of pairs of elements until they are unorderly paired
Fastest validation of sorting vector of pairs of elements until they are unorderly paired

Time:02-03

I have an unsorted vector of length N. Each element of the vector appears is present precisely twice (the vector length is an even number). I have a custom sorting algorithm, and the goal is to iterate until the vector achieves a state in which each element is adjacent to its copy.

Unsorted vector = {A,F,J,E,F,A,J,E}

A valid sorted state = {A,A,J,J,E,E,F,F}

Another valid sorted state = {J,J,A,A,F,F,E,E}

So my question lies in what is the fastest way to check if a sorted state is valid so that I can speed up my iterations? For long vectors, this will dictate most of my scaling ability.

CodePudding user response:

Something quick and dirty but I'm not sure it will always work:

all(duplicated(x) == c(FALSE,TRUE))

This is relying on the fact that the two same values will always be next to each other, one not-duplicated, the next duplicated. Seems to work with the test sets:

x <- c("A", "F", "J", "E", "F", "A", "J", "E")
s1 <- c("A", "A", "J", "J", "E", "E", "F", "F")
s2 <- c("J", "J", "A", "A", "F", "F", "E", "E")

all(duplicated(x) == c(FALSE,TRUE))
#[1] FALSE

all(duplicated(s1) == c(FALSE,TRUE))
#[1] TRUE

all(duplicated(s2) == c(FALSE,TRUE))
#[1] TRUE

And is pretty quick, looking through a million length vector in 5 hundredths of a second on my machine:

x <- rep(1:1e6, each=2)
system.time(all(duplicated(x) == c(FALSE,TRUE)))
#   user  system elapsed 
#   0.04    0.00    0.05 

CodePudding user response:

Another option is to use rle function:

v1 <- c("A", "F", "J", "E", "F", "A", "J", "E")
v2 <- c("A", "A", "J", "J", "E", "E", "F", "F")
v3 <- c("J", "J", "A", "A", "F", "F", "E", "E")

rle_obj <- rle(v3)
all(rle_obj$lengths > 1)

test:


> rle_obj <- rle(v1)
> all(rle_obj$lengths > 1)
[1] FALSE

> rle_obj <- rle(v2)
> all(rle_obj$lengths > 1)
[1] TRUE

> rle_obj <- rle(v3)
> all(rle_obj$lengths > 1)
[1] TRUE
> 

CodePudding user response:

An option involves converting the vector (as the length is even and an element is present exactly twice) to a two-row matrix, get the uniqueand test whether the number of rows is 1. If values duplicated are adjacent, while adding the dim attributes with matrix, the second row will be exactly the same as the first

f1 <- function(x)
{
nrow(unique(matrix(x, nrow = 2))) == 1
}

-testing

> v1 <- c("A", "F", "J", "E", "F", "A", "J", "E")
> v2 <- c("A", "A", "J", "J", "E", "E", "F", "F")
> v3 <- c("J", "J", "A", "A", "F", "F", "E", "E")
> f1(v1)
[1] FALSE
> f1(v2)
[1] TRUE
> f1(v3)
[1] TRUE

Or slightly more faster

f2 <- function(x) 
  {
  sum(duplicated(matrix(x, nrow = 2))) == 1
}

-testing

> f2(v1)
[1] FALSE
> f2(v2)
[1] TRUE
> f2(v3)
[1] TRUE

-benchmarks

#thelatemail
> f3 <- function(x) all(duplicated(x) == c(FALSE,TRUE))
#TarJae
> f4 <- function(x) {rle_obj <- rle(x); all(rle_obj$lengths > 1)}

> x1 <- rep(1:1e8, each = 2)
> system.time(f1(x1))
   user  system elapsed 
  2.649   0.456   3.111 
> system.time(f2(x1))
   user  system elapsed 
  2.258   0.433   2.694 
> system.time(f3(x1))
   user  system elapsed 
  9.972   1.272  11.233 
> system.time(f4(x1))
   user  system elapsed 
  7.051   3.281  10.333 
  • Related