I have hundreds of .wav files and imported them using list.files. Something like above:
[1] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Balsam-poplar-English-0701.wav"
[2] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Birch-English-0700.wav"
[3] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Blueberry-English-0703.wav"
.......
[73] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Capercaillie-English-0069.wav"
[74] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fat-tail-scorpion-English-0082.wav"
[75] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fire-salamander-English-0067.wav"
I use the following code to reorder the file paths of which I want number in each subpath follows numberic order. I have tried the following
filename<- file_list[order(as.numeric(stringr::str_extract(file_list,"[0-9] (.*?)")) )]
The result is something like:
[1] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Capercaillie-English-0069.wav"
[2] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fat-tail-scorpion-English-0082.wav"
[3] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fire-salamander-English-0067.wav"
.......
[73] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Balsam-poplar-English-0701.wav"
[74] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Birch-English-0700.wav"
[75] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Blueberry-English-0703.wav"
I also want the last subpath follows in numberic order, e.g. English-0067;English-0069. I tried to repeat the matching for the last subpath, but it will disorder the previous order followed by 3...10. How could I let all the numbers in the subpaths follows numberic order?
CodePudding user response:
another option:
ord <- order(as.numeric(sub("(^\\d )/.*$","\\1",files)), as.numeric(sub("^.*-(\\d )\\.wav","\\1",files)))
files[ord]
#> [1] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fire-salamander-English-0067.wav"
#> [2] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Capercaillie-English-0069.wav"
#> [3] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fat-tail-scorpion-English-0082.wav"
#> [4] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Birch-English-0700.wav"
#> [5] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Balsam-poplar-English-0701.wav"
#> [6] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Blueberry-English-0703.wav"
CodePudding user response:
Here's one approach:
vec <- c( "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Balsam-poplar-English-0701.wav",
"10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Birch-English-0700.wav",
"10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Blueberry-English-0703.wav",
"3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Capercaillie-English-0069.wav",
"3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fat-tail-scorpion-English-0082.wav",
"3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fire-salamander-English-0067.wav")
nums <- strcapture("^([0-9] ).*\\b([0-9] )\\.[a-z] $", vec, proto=list(a=0L,b=0L))
nums
# a b
# 1 10 701
# 2 10 700
# 3 10 703
# 4 3 69
# 5 3 82
# 6 3 67
do.call(order, nums)
# [1] 6 4 5 2 1 3
vec[do.call(order, nums)]
# [1] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fire-salamander-English-0067.wav"
# [2] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Capercaillie-English-0069.wav"
# [3] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fat-tail-scorpion-English-0082.wav"
# [4] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Birch-English-0700.wav"
# [5] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Balsam-poplar-English-0701.wav"
# [6] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Blueberry-English-0703.wav"
If you needed to also include the BL-0001
in your ordering, all it would take is a small addition to the regex, an additional entry in proto=
, and that's it. The use of do.call(order, nums)
will handle 1 or more columns, regardless of how many.
Note that if you over-tune your regex, rows that don't match both groups here will return NA
for both; this means it'll sort the NA
rows last. If you find that one or more filenames are misordered, check the regex and the intermediate nums
entries for those filenames.
CodePudding user response:
A tidyverse solution: structuring data as a table and using stringr::str_detect()
to arrange rows before extracting filenames.
vec <- c( "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Balsam-poplar-English-0701.wav",
"10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Birch-English-0700.wav",
"10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Blueberry-English-0703.wav",
"3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Capercaillie-English-0069.wav",
"3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fat-tail-scorpion-English-0082.wav",
"3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fire-salamander-English-0067.wav")
library(dplyr)
library(stringr)
vec_tib <- tibble(filename = vec)
vec_tib <- mutate(vec_tib,
num_1 = str_extract(filename, "\\d "),
num_2 = str_extract(filename, "\\d (?=(\\.wav))"))
head(vec_tib, 3)
#> # A tibble: 3 × 3
#> filename num_1 num_2
#> <chr> <chr> <chr>
#> 1 10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Balsa… 10 0701
#> 2 10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Birch… 10 0700
#> 3 10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Blueb… 10 0703
vec_tib <- mutate(vec_tib, across(starts_with("num"), as.numeric))
vec_tib |>
arrange(num_1, num_2) |>
pull(filename)
#> [1] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fire-salamander-English-0067.wav"
#> [2] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Capercaillie-English-0069.wav"
#> [3] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fat-tail-scorpion-English-0082.wav"
#> [4] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Birch-English-0700.wav"
#> [5] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Balsam-poplar-English-0701.wav"
#> [6] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Blueberry-English-0703.wav"
Created on 2022-11-28 with reprex v2.0.2