How do I sort a vector with many strings of numbers in ascending order?-CodePudding

I have a list of names which I would like to sort by the last number in ascending order.

[1] "W2345_S-001-R1_1.csv"     "W2346_S-001-R1_10.csv"    

[3] "W2347_S-001-R1_2.csv"     "W2348_S-001-R1_9.csv"    

[5] "W2345_S-001-R2_1.csv"     "W2346_S-001-R2_10.csv" 

[7] "W2347_S-001-R2_2.csv"     "W2348_S-001-R2_9.csv"

I would like to arrange them by R1 then R2. Within R1 or R2, it should be arranged as 1, 2, 9, 10. Hence the output should be

1] "W2345_S-001-R1_1.csv"     "W2346_S-001-R1_2.csv"    

[3] "W2347_S-001-R1_9.csv"     "W2348_S-001-R1_10.csv"    

[5] "W2345_S-001-R2_1.csv"     "W2346_S-001-R2_2.csv" 

[7] "W2347_S-001-R2_9.csv"     "W2348_S-001-R2_10.csv"

CodePudding user response：

Base R solution for fun,

d1 <- data.frame(v1 = as.numeric(gsub('^.*-R([0-9] )_([0-9] ).csv', '\\1', x)), 
                 v2 = as.numeric(gsub('^.*-R([0-9] )_([0-9] ).csv', '\\2', x)))
x[order(d1$v1, d1$v2)]

[1] "W2345_S-001-R1_1.csv"  "W2347_S-001-R1_2.csv"  "W2348_S-001-R1_9.csv"  "W2346_S-001-R1_10.csv" "W2345_S-001-R2_1.csv"  "W2347_S-001-R2_2.csv" 
[7] "W2348_S-001-R2_9.csv"  "W2346_S-001-R2_10.csv"

DATA

 dput(x)
c("W2345_S-001-R1_1.csv", "W2346_S-001-R1_10.csv", "W2347_S-001-R1_2.csv", 
"W2348_S-001-R1_9.csv", "W2345_S-001-R2_1.csv", "W2346_S-001-R2_10.csv", 
"W2347_S-001-R2_2.csv", "W2348_S-001-R2_9.csv")

CodePudding user response：

In your case, you should shorten your string to the numeric part that will be sorted: the first set of numbers is not the one you want ordering from. In that case, use gsub and str_order (gtools::mixedorder will work equally) to get the ordering correctly:

library(stringr)
v[str_order(gsub("W.*1-","", v), numeric = TRUE)]

#[1] "W2345_S-001-R1_1.csv"  "W2347_S-001-R1_2.csv"  "W2348_S-001-R1_9.csv"  "W2346_S-001-R1_10.csv"
#[5] "W2345_S-001-R2_1.csv"  "W2347_S-001-R2_2.csv"  "W2348_S-001-R2_9.csv"  "W2346_S-001-R2_10.csv"

Regex explanation: basically, "W.*1-" matches any substrings that start with W, finish with 1-, and have one or more characters in between .*. Many different regexes would work here, so this is only one possibility.