Convert JSON to data.frame using tidyjson-CodePudding

I have a JSON file that I want to convert to a table. It is easy to do with the jsonlite library. However, if the file is big, then the conversion takes a significant amount of time. So I am testing a tidyjson with the hope to speed up the process.

My JSON file looks as follows:

x = '[
    {
      "id": 1,
      "A": [
        {
          "B": "b1",
          "C": [
            "c1"
          ]
        }
      ]
    },
    {
      "id": 2,
      "A": [
        {
          "B": "b1",
          "C": [
            "c2"
          ]
        }
      ]
    }
]
'

That's how I process it:

library(tidyjson)
library(dplyr)
      
x %>% gather_array() %>%      
      spread_values(id = jstring("id")) %>%
      enter_object("A") %>% gather_array %>%
      spread_values(B = jstring("B")) %>%
      enter_object("C") %>% gather_array() %>%
      spread_values(C = jstring("C")) %>%
      select(id, B, C)

Outcome I get:

 ..JSON   id    B     C    
  <chr>    <chr> <chr> <chr>
1 "\"c1\"" 1     b1    NA   
2 "\"c2\"" 2     b1    NA

Cannot figure out what is wrong with the code and why it doesn’t work well for C. Any help is much appreciated.

UPDATE: Expected output:

 id    B     C    
 <chr> <chr> <chr>
1 1     b1    c1   
2 2     b1    c2

UPDATE 2: jsonlite way:

y = jsonlite::fromJSON(x)
cbind(id = y$id, do.call(rbind.data.frame, y$A))
   

  id  B  C
1  1 b1 c1
2  2 b1 c2

Not sure that it is the fastest way of using jsonlite in this case.

CodePudding user response：

We may use fromJSON from jsonlite

library(jsonlite)
library(tidyr)
library(dplyr)
fromJSON(x) %>% 
   unnest_wider(A) %>%
   unnest(C) %>%
   unnest(C)

-output

# A tibble: 2 x 3
     id B     C    
  <int> <chr> <chr>
1     1 b1    c1   
2     2 b1    c2

Or another option is

library(reticulate)
library(rrapply)
py_run_string(paste0("x = ", x))
rrapply(py$x, how = 'bind')
  A.1.C A.1.B id
1    c1    b1  1
2    c2    b1  2