I have a datafram such as :
COL1 Values
G1 dTMP biosynthetic process [GO:0006231]; dTTP biosynthetic process [GO:0006235]; methylation [GO:0032259]
G2 DNA integration [GO:0015074]; DNA recombination [GO:0006310]
G3 response to antibiotic [GO:0046677]
G4
G5 transcription, DNA-templated [GO:0006351]
And I would like to know how can I use gsub
function in order to only extract the element between the []
in the column Values
?
And get :
COL1 Values
G1 GO:0006231;GO:0006235;GO:0032259
G2 GO:0015074;GO:0006310
G3 GO:0046677
G4
G5 GO:0006351
Here is the data if it can helps
structure(list(COL1 = c("G1", "G2", "G3", "G4", "G5"), Values = c("dTMP biosynthetic process [GO:0006231]; dTTP biosynthetic process [GO:0006235]; methylation [GO:0032259]",
"DNA integration [GO:0015074]; DNA recombination [GO:0006310]",
"response to antibiotic [GO:0046677]", "", "transcription, DNA-templated [GO:0006351]"
)), class = "data.frame", row.names = c(NA, -5L))
CodePudding user response:
You can use regmatches
and gregexpr
:
regmatches(df$Values, gregexpr("(?=\\[).*?(?<=\\])", df$Values, perl=T))