Home > Software engineering >  Remove any characters before "_"
Remove any characters before "_"

Time:10-10

How can I remove any characters and digits before "_"; as an example:

> char <- c("SRR04_d3_GCTCGGTAAGCACCTCGCCACATA","SRR04_d1_ACTCGGTAAGCACCTCGCCACATA",
            "JH-HL_GCTCGGTAAGCATGTCGCCACATA","HZ04_d5_GCTCGGTAAGCACCTCGCCACATA")
> c("GCTCGGTAAGCACCTCGCCACATA","ACTCGGTAAGCACCTCGCCACATA",
            "GCTCGGTAAGCATGTCGCCACATA","GCTCGGTAAGCACCTCGCCACATA")
[1] "GCTCGGTAAGCACCTCGCCACATA" "ACTCGGTAAGCACCTCGCCACATA" "GCTCGGTAAGCATGTCGCCACATA"
[4] "GCTCGGTAAGCACCTCGCCACATA"

Can I do this with str_replace function from tidyverse

CodePudding user response:

You may do this with sub -

sub('.*_', '', char)

#[1] "GCTCGGTAAGCACCTCGCCACATA" "ACTCGGTAAGCACCTCGCCACATA"
#[3] "GCTCGGTAAGCATGTCGCCACATA" "GCTCGGTAAGCACCTCGCCACATA"

Or if you prefer stringr functions.

stringr::str_remove(char, '.*_')
stringr::str_replace(char, '.*_', '')

CodePudding user response:

The package stringr can be used to extract all the letters at the end of the string with:

library(stringr)
str_extract(char, "[[:alpha:]]*$")
# [1] "GCTCGGTAAGCACCTCGCCACATA" "ACTCGGTAAGCACCTCGCCACATA" "GCTCGGTAAGCATGTCGCCACATA"
# [4] "GCTCGGTAAGCACCTCGCCACATA"

CodePudding user response:

I would phrase your problem using gsub with the pattern [^\W_] _. This will target one or more alphanumeric characters before an underscore, any number of times.

char <- c("SRR04_d3_GCTCGGTAAGCACCTCGCCACATA","SRR04_d1_ACTCGGTAAGCACCTCGCCACATA",
      "JH-HL_GCTCGGTAAGCATGTCGCCACATA","HZ04_d5_GCTCGGTAAGCACCTCGCCACATA")
output <- gsub("[^\\W_] _", "", char)
output

[1] "GCTCGGTAAGCACCTCGCCACATA" "ACTCGGTAAGCACCTCGCCACATA"
[3] "GCTCGGTAAGCATGTCGCCACATA" "GCTCGGTAAGCACCTCGCCACATA"

CodePudding user response:

Base R:

Or use strsplit and sapply:

> sapply(strsplit(char, '_'), tail, n=1)
[1] "GCTCGGTAAGCACCTCGCCACATA" "ACTCGGTAAGCACCTCGCCACATA" "GCTCGGTAAGCATGTCGCCACATA" "GCTCGGTAAGCACCTCGCCACATA"
> 

CodePudding user response:

Here is an alternative way:

library(stringr)
str_replace_all(char, ".*_(?=[^:] $)", "")

output:

[1] "GCTCGGTAAGCACCTCGCCACATA" "ACTCGGTAAGCACCTCGCCACATA" "GCTCGGTAAGCATGTCGCCACATA"
[4] "GCTCGGTAAGCACCTCGCCACATA"
  • Related