I have a string that looks like this that I want to convert to a data.frame:
string <- "var1: 1, var2: 2, [{\"json_var1\": \"foo\", \"json_var2\": 1}]"
## this is expected
jsonlite::fromJSON(string)
#> Error: lexical error: invalid char in json text.
#> var1: 1, var2: 2, [{"json_var1"
#> (right here) ------^
## the json part is valid:
jsonlite::fromJSON("[{\"json_var1\": \"foo\", \"json_var2\": 1}]")
#> json_var1 json_var2
#> 1 foo 1
## Desired state
#> var1 var2 json_var1 json_var2
#> 1 1 2 foo 1
Any way to split the pre-json text so that I can parse it separately? In this case we can't rely on the number of elements in each section (non-json v json) but we can rely that there are on being only 2 sections.
CodePudding user response:
Here is one option by splitting the string into two parts and applying fromJSON
on the json part and read.dcf
on the rest
cbind(read.dcf(textConnection(gsub(",\\s*", "\n", sub("^([^\\[] ),\\s*\\[.*",
"\\1", string)))), jsonlite::fromJSON(sub("^[^\\[] ", "", string)))
-output
var1 var2 json_var1 json_var2
1 1 2 foo 1
Or another option is make the whole expression json by rearranging the quotes and [{
jsonlite::fromJSON(paste0("[{", gsub("(\\w )(?=:)",
'"\\1"', sub("\\[\\{", "", string), perl = TRUE)))
var1 var2 json_var1 json_var2
1 1 2 foo 1