Regular Expression in R (rename filename)-CodePudding

I have the following file names:

s <- c("(1) 1-B1-1_(miRNA-4_0).CEL", 
       "(10) 1-NEC 4-1_(miRNA-4_0).CEL", 
       "(11) 1-B5-1_(miRNA-4_0).CEL", 
       "(12) 1-B5-2_(miRNA-4_0).CEL")

How can I extract only the part between 1 and the number before _ with regular expressions in R?

I want this:

1-B1-1, 1-B5-1, 1-B5-3 etc.

CodePudding user response：

A possible solution, based on stringr::str_extract and lookaround:

library(stringr)

s <- c("(1) 1-B1-1_(miRNA-4_0).CEL", "(10) 1-NEC 4-1_(miRNA-4_0).CEL", "(11) 1-B5-1_(miRNA-4_0).CEL", "(12) 1-B5-2_(miRNA-4_0).CEL")

str_extract(s, "(?<=\\s).*(?=_\\()")

#> [1] "1-B1-1"    "1-NEC 4-1" "1-B5-1"    "1-B5-2"

CodePudding user response：

sub('\\S  (.*?)_.*', '\\1', s)
[1] "1-B1-1"    "1-NEC 4-1" "1-B5-1"    "1-B5-2"  

str_extract(s, '(?<= ).*?(?=_)')
[1] "1-B1-1"    "1-NEC 4-1" "1-B5-1"    "1-B5-2"

Note I stole s from @PaulS

CodePudding user response：

Here is a base R solution using gsub twice:

\\(\\d \\) match digit in parenthesis

'_[^_]*' match first underscore

trimws(gsub('\\(\\d \\)', '', gsub('_[^_]*', '', s)))

[1] "1-B1-1"    "1-NEC 4-1" "1-B5-1"    "1-B5-2"