Home > Net >  String extraction with regular expression in R
String extraction with regular expression in R

Time:10-09

I am trying to extract bunch of information from filenames using regular expressions in R. As I am matching the pattern, str_view() is showing me the correct set of strings. Yet, when I am trying to sub those and extract the remaining portion, it doesn't work. I also tried str_extract() but it isn't working. What am I doing wrong?

fname <- "TC2L6C_2020-08-14_1516_6C-ASG_29_00020.tab"

fext <- tools::file_path_sans_ext(fname)

stringr::str_view(fext, ".*-ASG_\\d _", match = TRUE)

P_num <- gsub(".*-ASG_\\d{2}_", "", fext)

P_num <- stringr::str_extract(fname, "(?<=-ASG_\\d )([^_])*(?=\\.tab)")

CodePudding user response:

Here is a simple approach using sub:

fname <- "TC2L6C_2020-08-14_1516_6C-ASG_29_00020.tab"
output <- sub("^.*-ASG_\\d _(.*)\\.tab$", "\\1", fname)
output

[1] "00020"

Above we use a capture group to isolate the portion of the filename, sans extension, which you want to match.

  • Related