Home > Enterprise >  regex to keep every element between two characters in column in R
regex to keep every element between two characters in column in R

Time:10-15

I have a datafram such as :

COL1 Values
G1   dTMP biosynthetic process [GO:0006231]; dTTP biosynthetic process [GO:0006235]; methylation [GO:0032259]
G2   DNA integration [GO:0015074]; DNA recombination [GO:0006310]
G3   response to antibiotic [GO:0046677]
G4
G5   transcription, DNA-templated [GO:0006351]

And I would like to know how can I use gsub function in order to only extract the element between the [] in the column Values ?

And get :

COL1  Values
G1    GO:0006231;GO:0006235;GO:0032259
G2    GO:0015074;GO:0006310
G3    GO:0046677
G4
G5    GO:0006351

Here is the data if it can helps

structure(list(COL1 = c("G1", "G2", "G3", "G4", "G5"), Values = c("dTMP biosynthetic process [GO:0006231]; dTTP biosynthetic process [GO:0006235]; methylation [GO:0032259]", 
"DNA integration [GO:0015074]; DNA recombination [GO:0006310]", 
"response to antibiotic [GO:0046677]", "", "transcription, DNA-templated [GO:0006351]"
)), class = "data.frame", row.names = c(NA, -5L))

CodePudding user response:

You can use regmatches and gregexpr:

regmatches(df$Values, gregexpr("(?=\\[).*?(?<=\\])", df$Values, perl=T))
  • Related