Home > Software design >  Read JSON file with nested lists in R
Read JSON file with nested lists in R

Time:01-30

I have a large json dataset and I would like to convert it to a data frame in R

(Sorry if it may be a duplicated question but other answers didn't help me) My Json file is as follows:

[{"src": "http://www.europarl.eu", "peid": "PE529.899v01-00", "reference": "2014/2021(INI)", "date": "2014-03-05T00:00:00", "committee": ["AFET"], "seq": 1, "id": "PE529.899-1", "orig_lang": "en", "new": ["- having regard to its resolution of 13", "December 20071 on Justice for the", "'Comfort Women' (sex slaves in Asia", "before and during World War II) as well", "as the statements by Japanese Chief", "Cabinet Secretary Yohei Kono in 1993", "and by the then Prime Minister Tomiichi", "Murayama in 1995, the resolutions of the", "Japanese parliament (the Diet) of 1995", "and 2005 expressing apologies for", "wartime victims, including victims of the", "'comfort women' system,", "_______________________", "1", "OJ C 323E, 18.12.2008, p.531"], "authors": "Reinhard Bütikofer on behalf of the Verts/ALE Group", "meps": [96739], "location": [["Motion for a resolution", "Citation 6 a (new)"]], "meta": {"created": "2019-07-03T05:06:17"}, "changes": {}}
,{"src": "http://www.europarl.eu", "peid": "PE529.863v01-00", "reference": "2014/2016(INI)", "date": "2014-02-27T00:00:00", "committee": ["AFET"], "seq": 1, "id": "PE529.863-1", "orig_lang": "en", "new": ["- having regard to the Statement by the", "Vice-President of the Commission/ High", "Representative of the Union for Foreign", "affairs and Security Policy (VP/HR)", "Catherine Ashton of 20 March 2013 on", "the Magnitsky case in the Russian", "Federation,"], "authors": "Jacek Protasiewicz", "meps": [23782], "location": [["Motion for a resolution", "Citation 4 a (new)"]], "meta": {"created": "2019-07-03T05:06:17"}, "changes": {}}
,{"src": "http://www.europarl.eu", "peid": "PE529.713v01-00", "reference": "2013/2149(INI)", "date": "2014-02-12T00:00:00", "committee": ["AFET"], "seq": 238, "id": "PE529.713-238", "orig_lang": "en", "old": ["A. whereas the European Neighbourhood", "Policy (ENP), in particular the Eastern", "Partnership (EaP), aims to extend the", "values and ideas of the founders of the EU;"], "new": ["A. whereas the European Neighbourhood", "Policy (ENP) embraces the values and", "ideas of the founders of the EU, notably", "the principles of Peace, Solidarity and", "Prosperity;"], "authors": "Mário David", "meps": [96973], "location": [["Motion for a resolution", "Recital A"]], "meta": {"created": "2019-07-03T05:06:18"}, "changes": {}}
,{"src": "http://www.europarl.eu", "peid": "PE529.899v01-00", "reference": "2014/2021(INI)", "date": "2014-03-05T00:00:00", "committee": ["AFET"], "seq": 2, "id": "PE529.899-2", "orig_lang": "en", "new": ["- having regard to the catastrophic", "earthquake and subsequent tsunami", "which devastated important parts of", "Japan's coast on 11 March 2011 and led", "to the destruction of the Fukushima", "nuclear power plant, causing possibly the", "greatest radiation disaster in human", "history,"], "authors": "Reinhard Bütikofer on behalf of the Verts/ALE Group", "meps": [96739], "location": [["Motion for a resolution", "Citation 11 a (new)"]], "meta": {"created": "2019-07-03T05:06:18"}, "changes": {}}

I would like to have a dataframe as follows:

         src               peid          reference                date           committee        seq        id        orig_lang             new                  ...  
http://www.europarl.eu PE529.899v01-00  2014/2021(INI)    2014-03-05T00:00:00       AFET           1      PE529.899-1       en      ["- having ... p.531"]          ...
http://www.europarl.eu PE529.863v01-00  2014/2016(INI)    2014-02-27T00:00:00       AFET          128     PE529.899-1       en      ["- having ..."Federation,"]  ...
http://www.europarl.eu PE529.713v01-00  2013/2149(INI)    2014-02-12T00:00:00       AFET          238     PE529.899-1       en      ["- having ..."Federation,"]    ...
http://www.europarl.eu PE529.899v01-00  2014/2021(INI)    2014-03-05T00:00:00       AFET           1      PE529.899-1       en      ["- having ..."Federation,"]    ...

(I didn't write the complete table above)

I have already tried the following codes:

library(rjson)
library(jsonlite)
Data <- fromJSON(file="data.json")

but each row is shown as below:

[[1]]
[[1]]$src
[1] "http://www.europarl.eu/sides/getDoc.do?pubRef=-//EP//NONSGML COMPARL PE-529.899 01 DOC PDF V0//EN&language=EN"

[[1]]$peid
[1] "PE529.899v01-00"

[[1]]$reference
[1] "2014/2021(INI)"

[[1]]$date
[1] "2014-03-05T00:00:00"

[[1]]$committee
[1] "AFET"

[[1]]$seq
[1] 1

[[1]]$id
[1] "PE529.899-1"

[[1]]$orig_lang
[1] "en"

[[1]]$new
[1] "- having regard to its resolution of 13"   "December 20071 on Justice for the"        
[3] "'Comfort Women' (sex slaves in Asia"       "before and during World War II) as well"  
[5] "as the statements by Japanese Chief"       "Cabinet Secretary Yohei Kono in 1993"     
[7] "and by the then Prime Minister Tomiichi"   "Murayama in 1995, the resolutions of the" 
[9] "Japanese parliament (the Diet) of 1995"    "and 2005 expressing apologies for"        
[11] "wartime victims, including victims of the" "'comfort women' system,"                  
[13] "_______________________"                   "1"                                        
[15] "OJ C 323E, 18.12.2008, p.531"             

[[1]]$authors
[1] "Reinhard Bütikofer on behalf of the Verts/ALE Group"

[[1]]$meps
[1] 96739

[[1]]$location
[[1]]$location[[1]]
[1] "Motion for a resolution" "Citation 6 a (new)"     


[[1]]$meta
[[1]]$meta$created
[1] "2019-07-03T05:06:17"


[[1]]$changes
list()

dput version is below:

list(list(src = "http://www.europarl.eu", 
    peid = "PE529.899v01-00", reference = "2014/2021(INI)", date = "2014-03-05T00:00:00", 
    committee = "AFET", seq = 1, id = "PE529.899-1", orig_lang = "en", 
    new = c("- having regard to its resolution of 13", "December 20071 on Justice for the", 
    "'Comfort Women' (sex slaves in Asia", "before and during World War II) as well", 
    "as the statements by Japanese Chief", "Cabinet Secretary Yohei Kono in 1993", 
    "and by the then Prime Minister Tomiichi", "Murayama in 1995, the resolutions of the", 
    "Japanese parliament (the Diet) of 1995", "and 2005 expressing apologies for", 
    "wartime victims, including victims of the", "'comfort women' system,", 
    "_______________________", "1", "OJ C 323E, 18.12.2008, p.531"
    ), authors = "Reinhard Bütikofer on behalf of the Verts/ALE Group", 
    meps = 96739, location = list(c("Motion for a resolution", 
    "Citation 6 a (new)")), meta = list(created = "2019-07-03T05:06:17"), 
    changes = list()))

One of the problems that I have is in column 9 as you can see below, I want to put all the 15 components in one cell of the dataframe

[[1]]$new
 [1] "- having regard to its resolution of 13"   "December 20071 on Justice for the"        
 [3] "'Comfort Women' (sex slaves in Asia"       "before and during World War II) as well"  
 [5] "as the statements by Japanese Chief"       "Cabinet Secretary Yohei Kono in 1993"     
 [7] "and by the then Prime Minister Tomiichi"   "Murayama in 1995, the resolutions of the" 
 [9] "Japanese parliament (the Diet) of 1995"    "and 2005 expressing apologies for"        
[11] "wartime victims, including victims of the" "'comfort women' system,"                  
[13] "_______________________"                   "1"                                        
[15] "OJ C 323E, 18.12.2008, p.531"

How can I get the table I mentioned above?

CodePudding user response:

We may either convert the nested list elements with lengths greater than 1 to a single string by pasteing (str_c) and then bind the named list to columns with _dfr

library(purrr)
library(dplyr)
library(stringr)
map_dfr(Data, ~ map(.x, unlist) %>%
     map_dfr(~ if(length(.x) > 1) str_c(.x, collapse = ";") else .x))

Or use a recursive function rrapply to bind the elements having length greater than 1 as list column

library(rrapply)
map_dfr(Data, ~ rrapply(.x, how = "bind"))
  • Related