I am trying to extract bunch of information from filenames using regular expressions in R. As I am matching the pattern, str_view() is showing me the correct set of strings. Yet, when I am trying to sub those and extract the remaining portion, it doesn't work. I also tried str_extract() but it isn't working. What am I doing wrong?
fname <- "TC2L6C_2020-08-14_1516_6C-ASG_29_00020.tab"
fext <- tools::file_path_sans_ext(fname)
stringr::str_view(fext, ".*-ASG_\\d _", match = TRUE)
P_num <- gsub(".*-ASG_\\d{2}_", "", fext)
P_num <- stringr::str_extract(fname, "(?<=-ASG_\\d )([^_])*(?=\\.tab)")
CodePudding user response:
Here is a simple approach using sub
:
fname <- "TC2L6C_2020-08-14_1516_6C-ASG_29_00020.tab"
output <- sub("^.*-ASG_\\d _(.*)\\.tab$", "\\1", fname)
output
[1] "00020"
Above we use a capture group to isolate the portion of the filename, sans extension, which you want to match.