I have a large json dataset and I would like to convert it to a data frame in R
(Sorry if it may be a duplicated question but other answers didn't help me) My Json file is as follows:
[{"src": "http://www.europarl.eu", "peid": "PE529.899v01-00", "reference": "2014/2021(INI)", "date": "2014-03-05T00:00:00", "committee": ["AFET"], "seq": 1, "id": "PE529.899-1", "orig_lang": "en", "new": ["- having regard to its resolution of 13", "December 20071 on Justice for the", "'Comfort Women' (sex slaves in Asia", "before and during World War II) as well", "as the statements by Japanese Chief", "Cabinet Secretary Yohei Kono in 1993", "and by the then Prime Minister Tomiichi", "Murayama in 1995, the resolutions of the", "Japanese parliament (the Diet) of 1995", "and 2005 expressing apologies for", "wartime victims, including victims of the", "'comfort women' system,", "_______________________", "1", "OJ C 323E, 18.12.2008, p.531"], "authors": "Reinhard Bütikofer on behalf of the Verts/ALE Group", "meps": [96739], "location": [["Motion for a resolution", "Citation 6 a (new)"]], "meta": {"created": "2019-07-03T05:06:17"}, "changes": {}}
,{"src": "http://www.europarl.eu", "peid": "PE529.863v01-00", "reference": "2014/2016(INI)", "date": "2014-02-27T00:00:00", "committee": ["AFET"], "seq": 1, "id": "PE529.863-1", "orig_lang": "en", "new": ["- having regard to the Statement by the", "Vice-President of the Commission/ High", "Representative of the Union for Foreign", "affairs and Security Policy (VP/HR)", "Catherine Ashton of 20 March 2013 on", "the Magnitsky case in the Russian", "Federation,"], "authors": "Jacek Protasiewicz", "meps": [23782], "location": [["Motion for a resolution", "Citation 4 a (new)"]], "meta": {"created": "2019-07-03T05:06:17"}, "changes": {}}
,{"src": "http://www.europarl.eu", "peid": "PE529.713v01-00", "reference": "2013/2149(INI)", "date": "2014-02-12T00:00:00", "committee": ["AFET"], "seq": 238, "id": "PE529.713-238", "orig_lang": "en", "old": ["A. whereas the European Neighbourhood", "Policy (ENP), in particular the Eastern", "Partnership (EaP), aims to extend the", "values and ideas of the founders of the EU;"], "new": ["A. whereas the European Neighbourhood", "Policy (ENP) embraces the values and", "ideas of the founders of the EU, notably", "the principles of Peace, Solidarity and", "Prosperity;"], "authors": "Mário David", "meps": [96973], "location": [["Motion for a resolution", "Recital A"]], "meta": {"created": "2019-07-03T05:06:18"}, "changes": {}}
,{"src": "http://www.europarl.eu", "peid": "PE529.899v01-00", "reference": "2014/2021(INI)", "date": "2014-03-05T00:00:00", "committee": ["AFET"], "seq": 2, "id": "PE529.899-2", "orig_lang": "en", "new": ["- having regard to the catastrophic", "earthquake and subsequent tsunami", "which devastated important parts of", "Japan's coast on 11 March 2011 and led", "to the destruction of the Fukushima", "nuclear power plant, causing possibly the", "greatest radiation disaster in human", "history,"], "authors": "Reinhard Bütikofer on behalf of the Verts/ALE Group", "meps": [96739], "location": [["Motion for a resolution", "Citation 11 a (new)"]], "meta": {"created": "2019-07-03T05:06:18"}, "changes": {}}
I would like to have a dataframe as follows:
src peid reference date committee seq id orig_lang new ...
http://www.europarl.eu PE529.899v01-00 2014/2021(INI) 2014-03-05T00:00:00 AFET 1 PE529.899-1 en ["- having ... p.531"] ...
http://www.europarl.eu PE529.863v01-00 2014/2016(INI) 2014-02-27T00:00:00 AFET 128 PE529.899-1 en ["- having ..."Federation,"] ...
http://www.europarl.eu PE529.713v01-00 2013/2149(INI) 2014-02-12T00:00:00 AFET 238 PE529.899-1 en ["- having ..."Federation,"] ...
http://www.europarl.eu PE529.899v01-00 2014/2021(INI) 2014-03-05T00:00:00 AFET 1 PE529.899-1 en ["- having ..."Federation,"] ...
(I didn't write the complete table above)
I have already tried the following codes:
library(rjson)
library(jsonlite)
Data <- fromJSON(file="data.json")
but each row is shown as below:
[[1]]
[[1]]$src
[1] "http://www.europarl.eu/sides/getDoc.do?pubRef=-//EP//NONSGML COMPARL PE-529.899 01 DOC PDF V0//EN&language=EN"
[[1]]$peid
[1] "PE529.899v01-00"
[[1]]$reference
[1] "2014/2021(INI)"
[[1]]$date
[1] "2014-03-05T00:00:00"
[[1]]$committee
[1] "AFET"
[[1]]$seq
[1] 1
[[1]]$id
[1] "PE529.899-1"
[[1]]$orig_lang
[1] "en"
[[1]]$new
[1] "- having regard to its resolution of 13" "December 20071 on Justice for the"
[3] "'Comfort Women' (sex slaves in Asia" "before and during World War II) as well"
[5] "as the statements by Japanese Chief" "Cabinet Secretary Yohei Kono in 1993"
[7] "and by the then Prime Minister Tomiichi" "Murayama in 1995, the resolutions of the"
[9] "Japanese parliament (the Diet) of 1995" "and 2005 expressing apologies for"
[11] "wartime victims, including victims of the" "'comfort women' system,"
[13] "_______________________" "1"
[15] "OJ C 323E, 18.12.2008, p.531"
[[1]]$authors
[1] "Reinhard Bütikofer on behalf of the Verts/ALE Group"
[[1]]$meps
[1] 96739
[[1]]$location
[[1]]$location[[1]]
[1] "Motion for a resolution" "Citation 6 a (new)"
[[1]]$meta
[[1]]$meta$created
[1] "2019-07-03T05:06:17"
[[1]]$changes
list()
dput version is below:
list(list(src = "http://www.europarl.eu",
peid = "PE529.899v01-00", reference = "2014/2021(INI)", date = "2014-03-05T00:00:00",
committee = "AFET", seq = 1, id = "PE529.899-1", orig_lang = "en",
new = c("- having regard to its resolution of 13", "December 20071 on Justice for the",
"'Comfort Women' (sex slaves in Asia", "before and during World War II) as well",
"as the statements by Japanese Chief", "Cabinet Secretary Yohei Kono in 1993",
"and by the then Prime Minister Tomiichi", "Murayama in 1995, the resolutions of the",
"Japanese parliament (the Diet) of 1995", "and 2005 expressing apologies for",
"wartime victims, including victims of the", "'comfort women' system,",
"_______________________", "1", "OJ C 323E, 18.12.2008, p.531"
), authors = "Reinhard Bütikofer on behalf of the Verts/ALE Group",
meps = 96739, location = list(c("Motion for a resolution",
"Citation 6 a (new)")), meta = list(created = "2019-07-03T05:06:17"),
changes = list()))
One of the problems that I have is in column 9 as you can see below, I want to put all the 15 components in one cell of the dataframe
[[1]]$new
[1] "- having regard to its resolution of 13" "December 20071 on Justice for the"
[3] "'Comfort Women' (sex slaves in Asia" "before and during World War II) as well"
[5] "as the statements by Japanese Chief" "Cabinet Secretary Yohei Kono in 1993"
[7] "and by the then Prime Minister Tomiichi" "Murayama in 1995, the resolutions of the"
[9] "Japanese parliament (the Diet) of 1995" "and 2005 expressing apologies for"
[11] "wartime victims, including victims of the" "'comfort women' system,"
[13] "_______________________" "1"
[15] "OJ C 323E, 18.12.2008, p.531"
How can I get the table I mentioned above?
CodePudding user response:
We may either convert the nested list
elements with lengths
greater than 1 to a single string by paste
ing (str_c
) and then bind the named list to columns with _dfr
library(purrr)
library(dplyr)
library(stringr)
map_dfr(Data, ~ map(.x, unlist) %>%
map_dfr(~ if(length(.x) > 1) str_c(.x, collapse = ";") else .x))
Or use a recursive function rrapply
to bind
the elements having length
greater than 1 as list
column
library(rrapply)
map_dfr(Data, ~ rrapply(.x, how = "bind"))