Home > Software engineering >  R - Query JSON and Save to Variable
R - Query JSON and Save to Variable

Time:11-25

So I'm trying to clean some data in R, Im not very experinced with R. What I would like to do is Get a certain peice of Data from a JSON and store it into a Varaible The JSON is very large and messy, I tried using JSON Lite and using the $ Variable selector but couldnt find a way to grab the specific peice of Data I needed, Theres very little information and resource out there on JSON Manipulation in R

Screenshot of JSON

JSON File :

{
  "error": false,
  "msg": "all countries and population 1961 - 2018",
  "data": [
    {
      "country": "Arab World",
      "code": "ARB",
      "populationCounts": [
        {
          "year": 1960,
          "value": 92197753
        },
        {
          "year": 1961,
          "value": 94724510
        },
        {
          "year": 1962,
          "value": 97334442
        }
      ]
    }
  ]
}

So lets say I want to gather the 1962 Value into a Variable, what would be the best way I do that

Any suggestion would be a huge help

CodePudding user response:

You are on the right lines with jsonlite. The data structure that you get in R depends on the JSON structure itself, of course, so you need to spend some time exploring the object in R to understand how you can manipulate it.

Assuming you have a file like your example data, named mydata.json in the current path, you can read it like this:

library(jsonlite)
library(tidyr) # comes in later

j <- fromJSON("mydata.json")

What's in j ?

names(j)

[1] "error" "msg"   "data" 

How about data ?

str(j$data)

'data.frame':   1 obs. of  3 variables:
 $ country         : chr "Arab World"
 $ code            : chr "ARB"
 $ populationCounts:List of 1
  ..$ :'data.frame':    3 obs. of  2 variables:
  .. ..$ year : int  1960 1961 1962
  .. ..$ value: int  92197753 94724510 97334442

So j$data is a dataframe with 3 columns. The last one, with the years and values is a list column. So we can get into that using tidyr::unnest():

j$data %>% 
  unnest(cols = c(populationCounts))

Result:

# A tibble: 3 × 4
  country    code   year    value
  <chr>      <chr> <int>    <int>
1 Arab World ARB    1960 92197753
2 Arab World ARB    1961 94724510
3 Arab World ARB    1962 97334442
  • Related