Home > other >  Custom letter ordering with normal numeric ordering in R
Custom letter ordering with normal numeric ordering in R

Time:11-20

I have a list of data tags as strings, much like this:

data <- c("ABCD 2", "ABCD 3", "WXYZ 1", "WXYZ 5", "WXYZ 3", "WXYZ 4", "ABCD 4", "ABCD 11")

Note that some numbers, including "1", are sometimes missing. A normal sort, of course, puts the ABCD tags before the WXYZ tags, and then puts ABCD 11 before ABCD 2.

I can easily overcome the numbering issue with gtools::mixedsort. But, for reasons of problem-specific context, I also want the WXYZ tags to come before the ABCD ones.

For example, when data above is sorted as I need it, it should look like this:

dataSorted <- c("WXYZ 1", "WXYZ 3", "WXYZ 4", "WXYZ 5", "ABCD 2", "ABCD 3", "ABCD 4", "ABCD 11")

Thankfully, I only need to deal with those two types of tags now, but I figure I should ask for a general solution. Is there a way to make gtools::mixedsort do reverse alpha but normal numeric ordering? If I set decreasing = TRUE then it also reverses all the number orders.

Right now I am just using a list to force the order, and that is not only inelegant, but since the numbers on the tags have no theoretical upper limit, it is also going to eventually break.

CodePudding user response:

We may extract the digits and non-digits separately, and then do the order after converting to factor with levels specified for the non-digits part

data[order(factor(sub("\\s \\d ", "", data), 
   levels = c("WXYZ", "ABCD")), as.integer(sub("\\S \\s ", "", data)))]

-output

[1] "WXYZ 1"  "WXYZ 3"  "WXYZ 4"  "WXYZ 5" 
[5] "ABCD 2"  "ABCD 3"  "ABCD 4"  "ABCD 11"

CodePudding user response:

This works without any pre-definitions or manually entered data. Only prerequisite is the first item has to be a letter-string and the second a number.

First, split the strings by space, followed by a grouping by letters and a sort of the numbers within the group. Then both have to be brought back together.

# split
dat <- setNames( data.frame( t(data.frame( strsplit( data, " " ) )[1,]),
  as.numeric( data.frame( strsplit( data, " " ) )[2,]) ), c("A","B") )
#                   A  B
#c..ABCD....2..  ABCD  2
#c..ABCD....3..  ABCD  3
#c..WXYZ....1..  WXYZ  1
#c..WXYZ....5..  WXYZ  5
#c..WXYZ....3..  WXYZ  3
#c..WXYZ....4..  WXYZ  4
#c..ABCD....4..  ABCD  4
#c..ABCD....11.. ABCD 11

# group and order
dat_agr <- aggregate( B ~ A, dat, sort, simplify=F )
dat_ord <- dat_agr[order(dat_agr[,"A"], decreasing=T),]
#     A           B
#2 WXYZ  1, 3, 4, 5
#1 ABCD 2, 3, 4, 11

# bring back together
unlist(lapply( dat_ord$A, function(x) sapply( 
  dat_ord[grep(x, dat_ord$A),"B"], function(y) paste(x,y) ) ))
[1] "WXYZ 1"  "WXYZ 3"  "WXYZ 4"  "WXYZ 5"  "ABCD 2"  "ABCD 3"  "ABCD 4" 
[8] "ABCD 11"
  • Related