Home > front end >  gsub regex method to extract multiple quoted strings in R column
gsub regex method to extract multiple quoted strings in R column

Time:11-06

I have strings like this --

c(Read1 = "101", Index1 = "0", Index2 = "0", Read2 = "0")

I would like to use gsub() and regex to extract those integer values within each quote.

Output would be something like,

101 0 0 0

They can also have a delimiter. I would like to take the sum of each list in the row for my final output.

I have a crappy method in R that extracts it all but doesn't insert a delimiter or space, and wanted to ask for help in doing that. Or just a better method if anyone has one. :)

Currently using....

<- as.data.frame(gsub('.*?"(.*?)".*? ', "\\1", proto_runs$CompletedCycles))
which is outputting....
101000)

CodePudding user response:

If this is your data:

string <- 'c(Read1 = "101", Index1 = "0", Index2 = "0", Read2 = "0")'

you can use str_extractand a negative character class:

library(stringr)
as.numeric(unlist(str_extract_all(string, '(?<=")[^",] (?=")')))
[1] 101   0   0   0

To take the sum simply wrap the function sum around the expression:

sum(as.numeric(unlist(str_extract_all (string, '(?<=")[^",] (?=")'))))
[1] 101

CodePudding user response:

I think you mean you have character strings like this:

string <- 'c(Read1 = "101", Index1 = "0", Index2 = "0", Read2 = "0")'

In which case you can do:

gsub("[\",A-z\\(\\)=]|([A-z]\\d)", "", string)
#> [1] "  101   0   0   0"

If you want the sum of the numbers, you could have:

sapply(strsplit(trimws(gsub("[\",A-z\\(\\)=]|([A-z]\\d)", "", string)), "\\D "), 
       function(x) sum(as.numeric(x)))
#> [1] 101

Though an even easier way that actually returns a vector of numbers would be:

as.numeric(eval(parse(text = string)))
#> [1] 101   0   0   0

CodePudding user response:

Now this is interesting: If your string is a correct R syntax, just parse it and evaluate

string <- 'c(Read1 = "101", Index1 = "0", Index2 = "0", Read2 = "0")'

as.integer(eval(parse(text=string)))
[1] 101   0   0   0
  • Related