How can I remove any characters and digits before "_"; as an example:
> char <- c("SRR04_d3_GCTCGGTAAGCACCTCGCCACATA","SRR04_d1_ACTCGGTAAGCACCTCGCCACATA",
"JH-HL_GCTCGGTAAGCATGTCGCCACATA","HZ04_d5_GCTCGGTAAGCACCTCGCCACATA")
> c("GCTCGGTAAGCACCTCGCCACATA","ACTCGGTAAGCACCTCGCCACATA",
"GCTCGGTAAGCATGTCGCCACATA","GCTCGGTAAGCACCTCGCCACATA")
[1] "GCTCGGTAAGCACCTCGCCACATA" "ACTCGGTAAGCACCTCGCCACATA" "GCTCGGTAAGCATGTCGCCACATA"
[4] "GCTCGGTAAGCACCTCGCCACATA"
Can I do this with str_replace
function from tidyverse
CodePudding user response:
You may do this with sub
-
sub('.*_', '', char)
#[1] "GCTCGGTAAGCACCTCGCCACATA" "ACTCGGTAAGCACCTCGCCACATA"
#[3] "GCTCGGTAAGCATGTCGCCACATA" "GCTCGGTAAGCACCTCGCCACATA"
Or if you prefer stringr
functions.
stringr::str_remove(char, '.*_')
stringr::str_replace(char, '.*_', '')
CodePudding user response:
The package stringr
can be used to extract all the letters at the end of the string with:
library(stringr)
str_extract(char, "[[:alpha:]]*$")
# [1] "GCTCGGTAAGCACCTCGCCACATA" "ACTCGGTAAGCACCTCGCCACATA" "GCTCGGTAAGCATGTCGCCACATA"
# [4] "GCTCGGTAAGCACCTCGCCACATA"
CodePudding user response:
I would phrase your problem using gsub
with the pattern [^\W_] _
. This will target one or more alphanumeric characters before an
underscore, any number of times.
char <- c("SRR04_d3_GCTCGGTAAGCACCTCGCCACATA","SRR04_d1_ACTCGGTAAGCACCTCGCCACATA",
"JH-HL_GCTCGGTAAGCATGTCGCCACATA","HZ04_d5_GCTCGGTAAGCACCTCGCCACATA")
output <- gsub("[^\\W_] _", "", char)
output
[1] "GCTCGGTAAGCACCTCGCCACATA" "ACTCGGTAAGCACCTCGCCACATA"
[3] "GCTCGGTAAGCATGTCGCCACATA" "GCTCGGTAAGCACCTCGCCACATA"
CodePudding user response:
Base R:
Or use strsplit
and sapply
:
> sapply(strsplit(char, '_'), tail, n=1)
[1] "GCTCGGTAAGCACCTCGCCACATA" "ACTCGGTAAGCACCTCGCCACATA" "GCTCGGTAAGCATGTCGCCACATA" "GCTCGGTAAGCACCTCGCCACATA"
>
CodePudding user response:
Here is an alternative way:
library(stringr)
str_replace_all(char, ".*_(?=[^:] $)", "")
output:
[1] "GCTCGGTAAGCACCTCGCCACATA" "ACTCGGTAAGCACCTCGCCACATA" "GCTCGGTAAGCATGTCGCCACATA"
[4] "GCTCGGTAAGCACCTCGCCACATA"