I have strings like this --
c(Read1 = "101", Index1 = "0", Index2 = "0", Read2 = "0")
I would like to use gsub()
and regex to extract those integer values within each quote.
Output would be something like,
101 0 0 0
They can also have a delimiter. I would like to take the sum of each list in the row for my final output.
I have a crappy method in R that extracts it all but doesn't insert a delimiter or space, and wanted to ask for help in doing that. Or just a better method if anyone has one. :)
Currently using....
<- as.data.frame(gsub('.*?"(.*?)".*? ', "\\1", proto_runs$CompletedCycles))
which is outputting....
101000)
CodePudding user response:
If this is your data:
string <- 'c(Read1 = "101", Index1 = "0", Index2 = "0", Read2 = "0")'
you can use str_extract
and a negative character class:
library(stringr)
as.numeric(unlist(str_extract_all(string, '(?<=")[^",] (?=")')))
[1] 101 0 0 0
To take the sum simply wrap the function sum
around the expression:
sum(as.numeric(unlist(str_extract_all (string, '(?<=")[^",] (?=")'))))
[1] 101
CodePudding user response:
I think you mean you have character strings like this:
string <- 'c(Read1 = "101", Index1 = "0", Index2 = "0", Read2 = "0")'
In which case you can do:
gsub("[\",A-z\\(\\)=]|([A-z]\\d)", "", string)
#> [1] " 101 0 0 0"
If you want the sum of the numbers, you could have:
sapply(strsplit(trimws(gsub("[\",A-z\\(\\)=]|([A-z]\\d)", "", string)), "\\D "),
function(x) sum(as.numeric(x)))
#> [1] 101
Though an even easier way that actually returns a vector of numbers would be:
as.numeric(eval(parse(text = string)))
#> [1] 101 0 0 0
CodePudding user response:
Now this is interesting: If your string is a correct R syntax, just parse
it and eval
uate
string <- 'c(Read1 = "101", Index1 = "0", Index2 = "0", Read2 = "0")'
as.integer(eval(parse(text=string)))
[1] 101 0 0 0