Home > OS >  How can I write a regex to order the paths of which I want to list them in numeric order
How can I write a regex to order the paths of which I want to list them in numeric order

Time:11-29

I have hundreds of .wav files and imported them using list.files. Something like above:

  [1] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Balsam-poplar-English-0701.wav"           
  [2] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Birch-English-0700.wav"                   
  [3] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Blueberry-English-0703.wav"  
.......
  [73] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Capercaillie-English-0069.wav"                       
  [74] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fat-tail-scorpion-English-0082.wav"                  
  [75] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fire-salamander-English-0067.wav"                    

I use the following code to reorder the file paths of which I want number in each subpath follows numberic order. I have tried the following

filename<- file_list[order(as.numeric(stringr::str_extract(file_list,"[0-9] (.*?)")) )]

The result is something like:

  [1] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Capercaillie-English-0069.wav"                       
  [2] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fat-tail-scorpion-English-0082.wav"                  
  [3] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fire-salamander-English-0067.wav"
.......
  [73] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Balsam-poplar-English-0701.wav"           
  [74] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Birch-English-0700.wav"                   
  [75] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Blueberry-English-0703.wav"  

I also want the last subpath follows in numberic order, e.g. English-0067;English-0069. I tried to repeat the matching for the last subpath, but it will disorder the previous order followed by 3...10. How could I let all the numbers in the subpaths follows numberic order?

CodePudding user response:

another option:

ord <- order(as.numeric(sub("(^\\d )/.*$","\\1",files)), as.numeric(sub("^.*-(\\d )\\.wav","\\1",files)))


files[ord]
#> [1] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fire-salamander-English-0067.wav"         
#> [2] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Capercaillie-English-0069.wav"            
#> [3] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fat-tail-scorpion-English-0082.wav"       
#> [4] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Birch-English-0700.wav"        
#> [5] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Balsam-poplar-English-0701.wav"
#> [6] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Blueberry-English-0703.wav"

CodePudding user response:

Here's one approach:

vec <- c( "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Balsam-poplar-English-0701.wav",
"10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Birch-English-0700.wav",
"10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Blueberry-English-0703.wav",
"3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Capercaillie-English-0069.wav",
"3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fat-tail-scorpion-English-0082.wav",
"3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fire-salamander-English-0067.wav")
nums <- strcapture("^([0-9] ).*\\b([0-9] )\\.[a-z] $", vec, proto=list(a=0L,b=0L))
nums
#    a   b
# 1 10 701
# 2 10 700
# 3 10 703
# 4  3  69
# 5  3  82
# 6  3  67
do.call(order, nums)
# [1] 6 4 5 2 1 3
vec[do.call(order, nums)]
# [1] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fire-salamander-English-0067.wav"         
# [2] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Capercaillie-English-0069.wav"            
# [3] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fat-tail-scorpion-English-0082.wav"       
# [4] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Birch-English-0700.wav"        
# [5] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Balsam-poplar-English-0701.wav"
# [6] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Blueberry-English-0703.wav"    

If you needed to also include the BL-0001 in your ordering, all it would take is a small addition to the regex, an additional entry in proto=, and that's it. The use of do.call(order, nums) will handle 1 or more columns, regardless of how many.

Note that if you over-tune your regex, rows that don't match both groups here will return NA for both; this means it'll sort the NA rows last. If you find that one or more filenames are misordered, check the regex and the intermediate nums entries for those filenames.

CodePudding user response:

A tidyverse solution: structuring data as a table and using stringr::str_detect() to arrange rows before extracting filenames.

vec <- c( "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Balsam-poplar-English-0701.wav",
          "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Birch-English-0700.wav",
          "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Blueberry-English-0703.wav",
          "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Capercaillie-English-0069.wav",
          "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fat-tail-scorpion-English-0082.wav",
          "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fire-salamander-English-0067.wav")

library(dplyr)
library(stringr)


vec_tib <- tibble(filename = vec)

vec_tib <- mutate(vec_tib,
                  num_1 = str_extract(filename, "\\d "),
                  num_2 = str_extract(filename, "\\d (?=(\\.wav))"))

head(vec_tib, 3)
#> # A tibble: 3 × 3
#>   filename                                                           num_1 num_2
#>   <chr>                                                              <chr> <chr>
#> 1 10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Balsa… 10    0701 
#> 2 10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Birch… 10    0700 
#> 3 10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Blueb… 10    0703

vec_tib <- mutate(vec_tib, across(starts_with("num"), as.numeric))

vec_tib |> 
  arrange(num_1, num_2) |> 
  pull(filename)
#> [1] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fire-salamander-English-0067.wav"         
#> [2] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Capercaillie-English-0069.wav"            
#> [3] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fat-tail-scorpion-English-0082.wav"       
#> [4] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Birch-English-0700.wav"        
#> [5] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Balsam-poplar-English-0701.wav"
#> [6] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Blueberry-English-0703.wav"

Created on 2022-11-28 with reprex v2.0.2

  • Related