I am trying to expand on this answer, by creating a solution that works both on the new_dat and the old_dat.

New Data

new_dat <- structure(list(`[0,25) east` = c(1269L, 85L), `[0,25) north` = c(364L, 
21L), `[0,25) south` = c(1172L, 97L), `[0,25) west` = c(549L, 
49L), `[100,250) east` = c(441L, 149L), `[100,250) north` = c(224L, 
45L), `[100,250) south` = c(521L, 247L), `[100,250) west` = c(770L, 
124L), `[100,500) east` = c(0L, 0L), `[100,500) north` = c(0L, 
0L), `[100,500) south` = c(0L, 0L), `[100,500) west` = c(0L, 
0L), `[1000,1000000] east` = c(53L, 0L), `[1000,1000000] north` = c(82L, 
0L), `[1000,1000000] south` = c(23L, 0L), `[1000,1000000] west` = c(63L, 
0L), `[1000,1500) east` = c(0L, 0L), `[1000,1500) north` = c(0L, 
0L), `[1000,1500) south` = c(0L, 0L), `[1000,1500) west` = c(0L, 
0L), `[1500,3000) east` = c(0L, 0L), `[1500,3000) north` = c(0L, 
0L), `[1500,3000) south` = c(0L, 0L), `[1500,3000) west` = c(0L, 
0L), `[25,100) east` = c(579L, 220L), `[25,100) north` = c(406L, 
58L), `[25,100) south` = c(1048L, 316L), `[25,100) west` = c(764L, 
131L), `[25,50) east` = c(0L, 0L), `[25,50) north` = c(0L, 0L
), `[25,50) south` = c(0L, 0L), `[25,50) west` = c(0L, 0L), `[250,500) east` = c(232L, 
172L), `[250,500) north` = c(207L, 40L), `[250,500) south` = c(202L, 
148L), `[250,500) west` = c(457L, 153L), `[3000,1000000] east` = c(0L, 
0L), `[3000,1000000] north` = c(0L, 0L), `[3000,1000000] south` = c(0L, 
0L), `[3000,1000000] west` = c(0L, 0L), `[50,100) east` = c(0L, 
0L), `[50,100) north` = c(0L, 0L), `[50,100) south` = c(0L, 0L
), `[50,100) west` = c(0L, 0L), `[500,1000) east` = c(103L, 0L
), `[500,1000) north` = c(185L, 0L), `[500,1000) south` = c(66L, 
0L), `[500,1000) west` = c(200L, 0L), `[500,1000000] east` = c(0L, 
288L), `[500,1000000] north` = c(0L, 120L), `[500,1000000] south` = c(0L, 
229L), `[500,1000000] west` = c(0L, 175L)), row.names = c("Andere akkerbouwbedrijven", 
"Andere combinatiebedrijven"), class = "data.frame")

Old data and original Solution

old_dat <- structure(list(`[0,25)` = 5L, `[100,250)` = 43L, `[100,500)` = 0L, 
    `[1000,1000000]` = 20L, `[1000,1500)` = 0L, `[1500,3000)` = 0L, 
    `[25,100)` = 38L, `[25,50)` = 0L, `[250,500)` = 27L, `[3000,1000000]` = 0L, 
    `[50,100)` = 0L, `[500,1000)` = 44L, `[500,1000000]` = 0L), row.names = "Type_A", class = "data.frame")

The solution makes uses of the fact that the sum of the two numbers in each column name added provide the correct order.

ord <- gsub("\\[|\\]|\\)", "", colnames(new_dat)) %>% 
         strsplit(",") %>% 
         lapply(as.numeric) %>% 
         lapply(sum) %>% 
         unlist %>% 
         order()

colnames(dat)[ord]

New approach

The new data not only has to numerical values but also a string value (east, north, south, west). I realised that I could use the same solution if I give east a value of 1, north of 2 and so on. The sum of the three numbers than still provides the correct order.

I have been having some trouble adapting the code though.

ord <- gsub("\\[|\\]|\\)", "", colnames(new_dat)) %>% 
         # provides "0,25 east", "0,25 north" etc

         strsplit(",") %>% 
         # provides "0" and "25 east", "0" and "25 north" etc

         lapply(as.numeric) %>% 
         lapply(sum) %>% 
         # SHOULD provide 0 25 1 (east), 0 25 2 (north) etc

         unlist %>% 
         order()

The issue lies in splitting the string in 3 parts, and convert the directions to a number, IF, and ONLY IF, there are three parts. Otherwise it should just use the two. How should I do this?

CodePudding user response：

To build on your solution you can do,

ord <- gsub("\\D ", ",", stri_replace_all_regex(names(new_dat), '[A-Za-z]', 1:4)) %>% 
     strsplit(",") %>% 
     lapply(as.numeric) %>% 
     lapply(sum, na.rm = TRUE) %>% 
     unlist() %>% 
     order()

> names(new_dat)[ord]
 [1] "[0,25) east"          "[0,25) south"         "[0,25) north"         "[0,25) west"          "[25,50) east"         "[25,50) south"        "[25,50) north"        "[25,50) west"         "[25,100) east"        "[25,100) south"      
[11] "[25,100) north"       "[25,100) west"        "[50,100) east"        "[50,100) south"       "[50,100) north"       "[50,100) west"        "[100,250) east"       "[100,250) south"      "[100,250) north"      "[100,250) west"      
[21] "[100,500) east"       "[100,500) south"      "[100,500) north"      "[100,500) west"       "[250,500) east"       "[250,500) south"      "[250,500) north"      "[250,500) west"       "[500,1000) east"      "[500,1000) south"    
[31] "[500,1000) north"     "[500,1000) west"      "[1000,1500) east"     "[1000,1500) south"    "[1000,1500) north"    "[1000,1500) west"     "[1500,3000) east"     "[1500,3000) south"    "[1500,3000) north"    "[1500,3000) west"    
[41] "[500,1000000] east"   "[500,1000000] south"  "[500,1000000] north"  "[500,1000000] west"   "[1000,1000000] east"  "[1000,1000000] south" "[1000,1000000] north" "[1000,1000000] west"  "[3000,1000000] east"  "[3000,1000000] south"
[51] "[3000,1000000] north" "[3000,1000000] west"

CodePudding user response：

Maybe a little overkill but with this one you don't need to find the patterns "east", "south" etc.

library(magrittr)
order_cols <- function(dat) {
  
  # look for words to order by
  s_ordered <- stringi::stri_extract_all_regex(colnames(dat), "[[:alpha:]] ") %>% 
    unlist() %>% 
    unique() %>% 
    sort()
  
  if (length(s_ordered) > 1) {
    # replace words with their alphabetical index
    cnames <- stringi::stri_replace_all_fixed(colnames(dat), s_ordered, seq_along(s_ordered), vectorise_all = FALSE)
  } else {
    cnames <- colnames(dat)
  }
  
  cnames %>% 
    stringi::stri_extract_all_regex("\\d ") %>% # extract all numbers (including the alphabetical index numbers)
    lapply(as.numeric) %>% 
    lapply(sum) %>% 
    unlist() %>% 
    order()
  
}

In the first part of the function, I extract strings from the colnames and order them. Their order is then used to replace the words in the colnames with their indexes. Afterwards, I extract numeric values and pretty much follow your initial approach. I put this in a function to make it easier to use:

colnames(new_dat)[order_cols(new_dat)]
#>  [1] "[0,25) east"          "[0,25) north"         "[0,25) south"        
#>  [4] "[0,25) west"          "[25,50) east"         "[25,50) north"       
#>  [7] "[25,50) south"        "[25,50) west"         "[25,100) east"       
#> [10] "[25,100) north"       "[25,100) south"       "[25,100) west"       
#> [13] "[50,100) east"        "[50,100) north"       "[50,100) south"      
#> [16] "[50,100) west"        "[100,250) east"       "[100,250) north"     
#> [19] "[100,250) south"      "[100,250) west"       "[100,500) east"      
#> [22] "[100,500) north"      "[100,500) south"      "[100,500) west"      
#> [25] "[250,500) east"       "[250,500) north"      "[250,500) south"     
#> [28] "[250,500) west"       "[500,1000) east"      "[500,1000) north"    
#> [31] "[500,1000) south"     "[500,1000) west"      "[1000,1500) east"    
#> [34] "[1000,1500) north"    "[1000,1500) south"    "[1000,1500) west"    
#> [37] "[1500,3000) east"     "[1500,3000) north"    "[1500,3000) south"   
#> [40] "[1500,3000) west"     "[500,1000000] east"   "[500,1000000] north" 
#> [43] "[500,1000000] south"  "[500,1000000] west"   "[1000,1000000] east" 
#> [46] "[1000,1000000] north" "[1000,1000000] south" "[1000,1000000] west" 
#> [49] "[3000,1000000] east"  "[3000,1000000] north" "[3000,1000000] south"
#> [52] "[3000,1000000] west"


colnames(dat)[order_cols(dat)]
#>  [1] "[0,25)"         "[25,50)"        "[25,100)"       "[50,100)"      
#>  [5] "[100,250)"      "[100,500)"      "[250,500)"      "[500,1000)"    
#>  [9] "[1000,1500)"    "[1500,3000)"    "[500,1000000]"  "[1000,1000000]"
#> [13] "[3000,1000000]"

^{Created on 2022-05-06 by the reprex package (v2.0.1)}

P.S.: If you are using a newer version of R (>= 4.10), you can use the native pipe (|>) instead of magrittr's %>%.