Home > Back-end >  Mixed string representation
Mixed string representation

Time:08-18

I have a string that looks like this that I want to convert to a data.frame:

string <- "var1: 1, var2: 2, [{\"json_var1\": \"foo\", \"json_var2\": 1}]"

## this is expected
jsonlite::fromJSON(string)
#> Error: lexical error: invalid char in json text.
#>                                        var1: 1, var2: 2, [{"json_var1"
#>                      (right here) ------^

## the json part is valid:
jsonlite::fromJSON("[{\"json_var1\": \"foo\", \"json_var2\": 1}]")
#>   json_var1 json_var2
#> 1       foo         1

## Desired state
#>   var1 var2 json_var1 json_var2
#> 1    1    2       foo         1

Any way to split the pre-json text so that I can parse it separately? In this case we can't rely on the number of elements in each section (non-json v json) but we can rely that there are on being only 2 sections.

CodePudding user response:

Here is one option by splitting the string into two parts and applying fromJSON on the json part and read.dcf on the rest

cbind(read.dcf(textConnection(gsub(",\\s*", "\n", sub("^([^\\[] ),\\s*\\[.*",
 "\\1", string)))), jsonlite::fromJSON(sub("^[^\\[] ", "", string)))

-output

  var1 var2 json_var1 json_var2
1    1    2       foo         1

Or another option is make the whole expression json by rearranging the quotes and [{

jsonlite::fromJSON(paste0("[{", gsub("(\\w )(?=:)", 
   '"\\1"', sub("\\[\\{", "", string), perl = TRUE)))
  var1 var2 json_var1 json_var2
1    1    2       foo         1
  • Related