I have a dataframe with some user information. One of these columns contains JSON strings where is where all the data I'm interested in. Like this:
eg <- tibble(user_id = c("10001", "10002"),
data = c("{'key': 'age', 'value': {'max': 40, 'min': 31}}",
"{'key': 'age', 'value': {'max': 30, 'min': 21}}"))
I unpack thise JSON with the following code, first fixing a formatting issue, then unpacking the JSON into nested lists with purr::map
, and then converting the nested lists into a dataframe:
eg$data_clean <- gsub("'", "\"", eg$data)
eg2 <- purrr::map(eg$data_clean, jsonlite::fromJSON)
eg3 <- rbindlist(eg2, fill = TRUE)
the problem with the final dataframe (eg3) is that I loose the user_id, and can't associate values to users. In reality the JSON is much more complex and not every user ends up with the same number of rows. Does anyone know how to convert my JSON into a dataframe, whilst preserving the user_id for the data? Many thanks.
CodePudding user response:
eg %>%
group_by(user_id) %>%
mutate(data = list(jsonlite::fromJSON(chartr("'", '"', data))))%>%
unnest_wider(data)%>%
unnest_wider(value)
user_id key max min
<chr> <chr> <int> <int>
1 10001 age 40 31
2 10002 age 30 21
CodePudding user response:
You could do:
eg$data_clean <- gsub("'", "\"", eg$data)
purrr::map(eg$data_clean, jsonlite::fromJSON) %>%
purrr::map2(eg$user_id,
~ tibble(user_id = .y, min = .x$value$min, max = .x$value$max)) %>%
data.table::rbindlist()
#> user_id min max
#> 1: 10001 31 40
#> 2: 10002 21 30
CodePudding user response:
Another possible solution:
library(tidyverse)
eg %>%
mutate(map_dfr(data, ~ str_extract_all(.x, "\\d ") %>%
map(~ set_names(.x, c("max", "min")))), data = NULL)
#> # A tibble: 2 × 3
#> user_id max min
#> <chr> <chr> <chr>
#> 1 10001 40 31
#> 2 10002 30 21
Yet another possible solution:
library(tidyverse)
eg %>%
rowwise %>%
mutate(aux = str_extract_all(data, "\\d ") %>% unlist %>% str_c(collapse = ","), data=NULL) %>%
separate(aux, into = c("max", "min"), sep=",", convert = T)
#> # A tibble: 2 × 3
#> user_id max min
#> <chr> <int> <int>
#> 1 10001 40 31
#> 2 10002 30 21