Home > Software engineering >  R Sort or order with custom compare function
R Sort or order with custom compare function

Time:11-28

Can I pass a custom compare function to order that, given two items, indicates which one is ranked higher?

In my specific case I have the following list.

scores <- list(
    'a' = c(1, 1, 2, 3, 4, 4),
    'b' = c(1, 2, 2, 2, 3, 4),
    'c' = c(1, 1, 2, 2, 3, 4),
    'd' = c(1, 2, 3, 3, 3, 4)
)

If we take two vectors a and b, the index of the first element i at which a[i] > b[i] or a[i] < b[i] should determine what vector comes first. In this example, scores[['d']] > scores[['a']] because scores[['d']][2] > scores[['a']][2] (note that it doesn't matter that scores[['d']][5] < scores[['a']][5]).

Comparing two of those vectors could look something like this.

compare <- function(a, b) {
    # get first element index at which vectors differ
    i <- which.max(a != b)
    if(a[i] > b[i])
        1
    else if(a[i] < b[i])
        -1
    else
        0
}

The sorted keys of scores by using this comparison function should then be d, b, a, c.

From other solutions I've found, they mess with the data before ordering or introduce S3 classes and apply comparison attributes. With the former I fail to see how to mess with my data (maybe turn it into strings? But then what about numbers above 9?), with the latter I feel uncomfortable introducing a new class into my R package only for comparing vectors. And there doesn't seem to be a sort of comparator parameter I'd want to pass to order.

CodePudding user response:

Here's an attempt. I've explained every step in the comments.

compare <- function(a, b) {
  
  # subtract vector a from vector b
  comparison <- a - b
  # get the first non-zero result
  restult <- comparison[comparison != 0][1]
  # return 1 if result == 1 and 2 if result == -1 (0 if equal)
  if(is.na(restult)) {return(0)} else if(restult == 1) {return(1)} else {return(2)}
  
}

compare_list <- function(list_) {
  
  # get combinations of all possible comparison
  comparisons <- combn(length(list_), 2)
  # compare all possibilities
  results <- apply(comparisons, 2, function(x) {
    # get the "winner"
    x[compare(list_[[x[1]]], list_[[x[2]]])]
  })
  # get frequency table (how often a vector "won" -> this is the result you want)
  fr_tab <- table(results)
  # vector that is last in comparison
  last_vector <- which(!(1:length(list_) %in% as.numeric(names(fr_tab))))
  # return the sorted results and add the last vectors name
  c(as.numeric(names(sort(fr_tab, decreasing = T))), last_vector)
  
}

If you run the function on your example, the result is

> compare_list(scores)
[1] 4 2 1 3

I haven't dealt with the case that the two vectors are identical, you haven't explained how to deal with this.

CodePudding user response:

The native R way to do this is to introduce an S3 class.

There are two things you can do with the class. You can define a method for xtfrm that converts your list entries to numbers. That could be vectorized, and conceivably could be really fast.

But you were asking for a user defined compare function. This is going to be slow because R function calls are slow, and it's a little clumsy because nobody does it. But following the instructions in the xtfrm help page, here's how to do it:

scores <- list(
  'a' = c(1, 1, 2, 3, 4, 4),
  'b' = c(1, 2, 2, 2, 3, 4),
  'c' = c(1, 1, 2, 2, 3, 4),
  'd' = c(1, 2, 3, 3, 3, 4)
)

# Add a class to the list

scores <- structure(scores, class = "lexico")

# Need to keep the class when subsetting

`[.lexico` <- function(x, i, ...) structure(unclass(x)[i], class = "lexico")

# Careful here:  identical() might be too strict

`==.lexico` <- function(a, b) {identical(a, b)}

`>.lexico` <- function(a, b) {
  a <- a[[1]]
  b <- b[[1]]
  i <- which(a != b)
  length(i) > 0 && a[i[1]] > b[i[1]]
}

is.na.lexico <- function(a) FALSE

sort(scores)
#> $c
#> [1] 1 1 2 2 3 4
#> 
#> $a
#> [1] 1 1 2 3 4 4
#> 
#> $b
#> [1] 1 2 2 2 3 4
#> 
#> $d
#> [1] 1 2 3 3 3 4
#> 
#> attr(,"class")
#> [1] "lexico"

Created on 2021-11-27 by the reprex package (v2.0.1)

This is the opposite of the order you asked for, because by default sort() sorts to increasing order. If you really want d, b, a, c use sort(scores, decreasing = TRUE.

  • Related