Can I get str_extract to take a piece of text in the middle of a file name?-CodePudding

A different program I am using (Raven Pro) results in hundreds of .txt files that include nine variables with headers. I also need the file name that each line is being pulled from.

I am using stringr::str_extract(names in order to get a file name thrown into a dataframe with rbindlist. My problem is that I only want a portion of the file name included.

Here's an example of one of my file names -

BIOL10_20201206_180000.wav.Table01.txt

so if I do ("\\d ")) to try and get numbers it only picks up the 10 before the underscore, but the portion of the file name I need is 20201206_180000

Any help to get around this is appreciated :)

library(plyr)
myfiles <-  list.files(path=folder, pattern="*.txt", full.names = FALSE)
dat_tab <- sapply(myfiles, read.table, header= TRUE, sep = "\t", simplify = FALSE, USE.NAMES = TRUE) 
names(dat_tab) <- stringr::str_extract(names(dat_tab), ("\\d "))   
binded1 = rbindlist(dat_tab, idcol = "files", fill = TRUE)

ended up with file name coming in as "10" from the file name "BIOL10_20201206_180000.wav.Table01.txt"

CodePudding user response：

You can specify the length:

library(stringr)
str_extract(x, "\\d{8}_\\d{6}")
# "20201206_180000"

CodePudding user response：

A couple other options:

x <- "BIOL10_20201206_180000.wav.Table01.txt"

#option 1
sub("^.*?(\\d _\\d ).*$", "\\1", x)
#> [1] "20201206_180000"

#option 2
stringr::str_extract(x, "(?<=_)\\d _\\d ")
#> [1] "20201206_180000"