Home > OS >  gsub long complex strings
gsub long complex strings

Time:06-11

I have the following string BM1C-18 ORF2 (ORF2) gene; and nonfunctional ORF1 (ORF1) gene for which I want to remove everything except BM1C-18. I Tried standard gsub, but this is not working due to the spaces and symbols within the string. Any ideas how to solve this?

CodePudding user response:

A possible solution, using stringr:str_extract:

library(stringr)

string <- "BM1C-18 ORF2 (ORF2) gene; and nonfunctional ORF1 (ORF1) gene"

str_extract(string, "BM1C-18")

#> [1] "BM1C-18"

To deal with a list of strings:

library(stringr)

string <- "BM1C-18 ORF2 (ORF2) gene; and nonfunctional ORF1 (ORF1) gene"
lSrings  <- list(string, string, string)

str_extract(lSrings, "BM1C-18")

#> [1] "BM1C-18" "BM1C-18" "BM1C-18"

With a column of strings in a dataframe:

library(tidyverse)

string <- "BM1C-18 ORF2 (ORF2) gene; and nonfunctional ORF1 (ORF1) gene"

df <- data.frame(col1 = rep(string, 3))

df %>% 
 mutate(col1 = str_extract(col1, "BM1C-18"))

#>      col1
#> 1 BM1C-18
#> 2 BM1C-18
#> 3 BM1C-18

CodePudding user response:

In addition to PaulS solution and in Case BM1C-18 is in this position then you could do:

sub(" .*", "", string)

[1] "BM1C-18"

or:

library(stringr)
word(string, 1)

[1] "BM1C-18"
  • Related