I'm new to JSON data and am having a bit of trouble trying to get my data into a combined data frame common to data frames in R. Here is an example of the JSON data:
{
"id": "rub_al_khali",
"conversion_px": 0.0395882818685669,
"n_surfaces": 4,
"lithic_contours": [
{
"surface_id": 0,
"classification": "Ventral",
"total_area_px": 530565.5,
"total_area": 831.5,
"max_breadth": 22.4,
"max_length": 54,
"polygon_count": 7,
"scar_count": 0,
"percentage_detected_scars": 0,
"scar_contours": []
},
{
"surface_id": 1,
"classification": "Dorsal",
"total_area_px": 530503.5,
"total_area": 831.4,
"max_breadth": 22.4,
"max_length": 54,
"polygon_count": 7,
"scar_count": 4,
"percentage_detected_scars": 0.62,
"scar_contours": [
{
"scar_id": 0,
"total_area_px": 129337,
"total_area": 202.7,
"max_breadth": 10.3,
"max_length": 41.7,
"percentage_of_surface": 0.24,
"scar_angle": 1.85,
"polygon_count": 5
},
{
"scar_id": 1,
"total_area_px": 100130,
"total_area": 156.9,
"max_breadth": 7.2,
"max_length": 43,
"percentage_of_surface": 0.19,
"scar_angle": 357.36,
"polygon_count": 4
},
{
"scar_id": 2,
"total_area_px": 93162,
"total_area": 146,
"max_breadth": 6.5,
"max_length": 41.4,
"percentage_of_surface": 0.18,
"scar_angle": 5.01,
"polygon_count": 4
},
{
"scar_id": 3,
"total_area_px": 6148.5,
"total_area": 9.6,
"max_breadth": 4,
"max_length": 7.1,
"percentage_of_surface": 0.01,
"scar_angle": "NaN",
"polygon_count": 9
}
]
},
{
"surface_id": 2,
"classification": "Lateral",
"total_area_px": 176204,
"total_area": 276.2,
"max_breadth": 8.6,
"max_length": 54.2,
"polygon_count": 3,
"scar_count": 2,
"percentage_detected_scars": 0.33,
"scar_contours": [
{
"scar_id": 0,
"total_area_px": 44605,
"total_area": 69.9,
"max_breadth": 5,
"max_length": 50,
"percentage_of_surface": 0.25,
"scar_angle": "NaN",
"polygon_count": 3
},
{
"scar_id": 1,
"total_area_px": 12877,
"total_area": 20.2,
"max_breadth": 1.5,
"max_length": 22.3,
"percentage_of_surface": 0.07,
"scar_angle": "NaN",
"polygon_count": 2
}
]
},
{
"surface_id": 3,
"classification": "Platform",
"total_area_px": 55252.5,
"total_area": 86.6,
"max_breadth": 20.3,
"max_length": 6.6,
"polygon_count": 5,
"scar_count": 1,
"percentage_detected_scars": 0.42,
"scar_contours": [
{
"scar_id": 0,
"total_area_px": 23298.5,
"total_area": 36.5,
"max_breadth": 15,
"max_length": 4.1,
"percentage_of_surface": 0.42,
"scar_angle": "NaN",
"polygon_count": 4
}
]
}
]
}
So far I've used jsonlite to import to R using flatten = TRUE
library(jsonlite)
dta <- fromJSON("~/rub_al_khali.json", flatten = TRUE)
and while this gets me half way there it's not really a combined/comprehensive data.frame
. I think that it might be the dta$lithic_contours that is creating the issue. Any help is much appreciated
CodePudding user response:
jsonlite::fromJSON()
returns a list, but the element lithic_contours
contains a data.frame. Just subset the list to get your data.frame:
# Subset the list on lithic_contours with $ ...
df <- jsonlite::fromJSON(<file>, flatten = TRUE)$lithic_contours
# ... and it's already a data.frame
class(df)
#> [1] "data.frame"
# Turning into a tibble for better printing
tibble::as_tibble(df)
#> # A tibble: 4 × 10
#> surface_id classification total_area_px total_area max_breadth max_length
#> <int> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 0 Ventral 530566. 832. 22.4 54
#> 2 1 Dorsal 530504. 831. 22.4 54
#> 3 2 Lateral 176204 276. 8.6 54.2
#> 4 3 Platform 55252. 86.6 20.3 6.6
#> # … with 4 more variables: polygon_count <int>, scar_count <int>,
#> # percentage_detected_scars <dbl>, scar_contours <list>
Created on 2022-04-04 by the reprex package (v2.0.1)
Update: unnesting list column
The scar_contours
column of your dataframe is a list column. This is actually often a quite convenient format for analysis, but if you want to remove it you can use the function tidyr::unnest()
:
library(tidyr)
df %>% unnest(scar_contours, names_repair = "minimal")
#> # A tibble: 7 × 17
#> surface_id classification total_area_px total_area max_breadth max_length
#> <int> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 1 Dorsal 530504. 831. 22.4 54
#> 2 1 Dorsal 530504. 831. 22.4 54
#> 3 1 Dorsal 530504. 831. 22.4 54
#> 4 1 Dorsal 530504. 831. 22.4 54
#> 5 2 Lateral 176204 276. 8.6 54.2
#> 6 2 Lateral 176204 276. 8.6 54.2
#> 7 3 Platform 55252. 86.6 20.3 6.6
#> # … with 11 more variables: polygon_count <int>, scar_count <int>,
#> # percentage_detected_scars <dbl>, scar_id <int>, total_area_px <dbl>,
#> # total_area <dbl>, max_breadth <dbl>, max_length <dbl>,
#> # percentage_of_surface <dbl>, scar_angle <dbl>, polygon_count <int>