I have the following file names:
s <- c("(1) 1-B1-1_(miRNA-4_0).CEL",
"(10) 1-NEC 4-1_(miRNA-4_0).CEL",
"(11) 1-B5-1_(miRNA-4_0).CEL",
"(12) 1-B5-2_(miRNA-4_0).CEL")
How can I extract only the part between 1 and the number before _
with regular expressions in R?
I want this:
1-B1-1, 1-B5-1, 1-B5-3 etc.
CodePudding user response:
A possible solution, based on stringr::str_extract
and lookaround:
library(stringr)
s <- c("(1) 1-B1-1_(miRNA-4_0).CEL", "(10) 1-NEC 4-1_(miRNA-4_0).CEL", "(11) 1-B5-1_(miRNA-4_0).CEL", "(12) 1-B5-2_(miRNA-4_0).CEL")
str_extract(s, "(?<=\\s).*(?=_\\()")
#> [1] "1-B1-1" "1-NEC 4-1" "1-B5-1" "1-B5-2"
CodePudding user response:
sub('\\S (.*?)_.*', '\\1', s)
[1] "1-B1-1" "1-NEC 4-1" "1-B5-1" "1-B5-2"
str_extract(s, '(?<= ).*?(?=_)')
[1] "1-B1-1" "1-NEC 4-1" "1-B5-1" "1-B5-2"
Note I stole s
from @PaulS
CodePudding user response:
Here is a base R solution using gsub
twice:
\\(\\d \\)
match digit in parenthesis
'_[^_]*'
match first underscore
trimws(gsub('\\(\\d \\)', '', gsub('_[^_]*', '', s)))
[1] "1-B1-1" "1-NEC 4-1" "1-B5-1" "1-B5-2"