I have a list of data tags as strings, much like this:
data <- c("ABCD 2", "ABCD 3", "WXYZ 1", "WXYZ 5", "WXYZ 3", "WXYZ 4", "ABCD 4", "ABCD 11")
Note that some numbers, including "1", are sometimes missing. A normal sort, of course, puts the ABCD
tags before the WXYZ
tags, and then puts ABCD 11
before ABCD 2
.
I can easily overcome the numbering issue with gtools::mixedsort
. But, for reasons of problem-specific context, I also want the WXYZ
tags to come before the ABCD
ones.
For example, when data
above is sorted as I need it, it should look like this:
dataSorted <- c("WXYZ 1", "WXYZ 3", "WXYZ 4", "WXYZ 5", "ABCD 2", "ABCD 3", "ABCD 4", "ABCD 11")
Thankfully, I only need to deal with those two types of tags now, but I figure I should ask for a general solution. Is there a way to make gtools::mixedsort
do reverse alpha but normal numeric ordering? If I set decreasing = TRUE
then it also reverses all the number orders.
Right now I am just using a list to force the order, and that is not only inelegant, but since the numbers on the tags have no theoretical upper limit, it is also going to eventually break.
CodePudding user response:
We may extract the digits and non-digits separately, and then do the order
after converting to factor
with levels
specified for the non-digits part
data[order(factor(sub("\\s \\d ", "", data),
levels = c("WXYZ", "ABCD")), as.integer(sub("\\S \\s ", "", data)))]
-output
[1] "WXYZ 1" "WXYZ 3" "WXYZ 4" "WXYZ 5"
[5] "ABCD 2" "ABCD 3" "ABCD 4" "ABCD 11"
CodePudding user response:
This works without any pre-definitions or manually entered data. Only prerequisite is the first item has to be a letter-string and the second a number.
First, split the strings by space, followed by a grouping by letters and a sort of the numbers within the group. Then both have to be brought back together.
# split
dat <- setNames( data.frame( t(data.frame( strsplit( data, " " ) )[1,]),
as.numeric( data.frame( strsplit( data, " " ) )[2,]) ), c("A","B") )
# A B
#c..ABCD....2.. ABCD 2
#c..ABCD....3.. ABCD 3
#c..WXYZ....1.. WXYZ 1
#c..WXYZ....5.. WXYZ 5
#c..WXYZ....3.. WXYZ 3
#c..WXYZ....4.. WXYZ 4
#c..ABCD....4.. ABCD 4
#c..ABCD....11.. ABCD 11
# group and order
dat_agr <- aggregate( B ~ A, dat, sort, simplify=F )
dat_ord <- dat_agr[order(dat_agr[,"A"], decreasing=T),]
# A B
#2 WXYZ 1, 3, 4, 5
#1 ABCD 2, 3, 4, 11
# bring back together
unlist(lapply( dat_ord$A, function(x) sapply(
dat_ord[grep(x, dat_ord$A),"B"], function(y) paste(x,y) ) ))
[1] "WXYZ 1" "WXYZ 3" "WXYZ 4" "WXYZ 5" "ABCD 2" "ABCD 3" "ABCD 4"
[8] "ABCD 11"