I have the following string BM1C-18 ORF2 (ORF2) gene; and nonfunctional ORF1 (ORF1) gene
for which I want to remove everything except BM1C-18
. I Tried standard gsub, but this is not working due to the spaces and symbols within the string. Any ideas how to solve this?
CodePudding user response:
A possible solution, using stringr:str_extract
:
library(stringr)
string <- "BM1C-18 ORF2 (ORF2) gene; and nonfunctional ORF1 (ORF1) gene"
str_extract(string, "BM1C-18")
#> [1] "BM1C-18"
To deal with a list of strings:
library(stringr)
string <- "BM1C-18 ORF2 (ORF2) gene; and nonfunctional ORF1 (ORF1) gene"
lSrings <- list(string, string, string)
str_extract(lSrings, "BM1C-18")
#> [1] "BM1C-18" "BM1C-18" "BM1C-18"
With a column of strings in a dataframe:
library(tidyverse)
string <- "BM1C-18 ORF2 (ORF2) gene; and nonfunctional ORF1 (ORF1) gene"
df <- data.frame(col1 = rep(string, 3))
df %>%
mutate(col1 = str_extract(col1, "BM1C-18"))
#> col1
#> 1 BM1C-18
#> 2 BM1C-18
#> 3 BM1C-18
CodePudding user response:
In addition to PaulS solution and in Case BM1C-18
is in this position then you could do:
sub(" .*", "", string)
[1] "BM1C-18"
or:
library(stringr)
word(string, 1)
[1] "BM1C-18"