I have the following data set:
Category Term Count
1 GOTERM_MF_DIRECT GO:0005102~receptor binding 3
2 GOTERM_MF_DIRECT GO:0008139~nuclear localization sequence binding 2
3 GOTERM_CC_DIRECT GO:0016021~integral component of membrane 9
4 GOTERM_CC_DIRECT GO:0071564~npBAF complex 3
I want to keep the MF/CC in the first column and extract the string starting from "~" (to exclude GO:001..) in the third column. I can do it using loops but is there an elegant way to achieve what I need? Thanks in advance!
CodePudding user response:
You could do
library(dplyr)
df %>%
mutate(Category = substr(Category, 8,9),
Term = stringr::str_remove(Term, "(.*?)~"))
Output:
Category Term Count
<chr> <chr> <dbl>
1 MF receptor binding 3
2 MF nuclear localization sequence binding 2
3 CC integral component of membrane 9
4 CC npBAF complex 3
Data:
df <- tibble::tribble(
~Category, ~Term, ~Count,
"GOTERM_MF_DIRECT", "GO:0005102~receptor binding", 3,
"GOTERM_MF_DIRECT", "GO:0008139~nuclear localization sequence binding", 2,
"GOTERM_CC_DIRECT", "GO:0016021~integral component of membrane", 9,
"GOTERM_CC_DIRECT", "GO:0071564~npBAF complex", 3
)