Home > database >  Assign row id when converting JSON to dataframe
Assign row id when converting JSON to dataframe

Time:04-12

I have a dataframe with some user information. One of these columns contains JSON strings where is where all the data I'm interested in. Like this:

eg <- tibble(user_id = c("10001", "10002"),
             data = c("{'key': 'age', 'value': {'max': 40, 'min': 31}}", 
                      "{'key': 'age', 'value': {'max': 30, 'min': 21}}"))

I unpack thise JSON with the following code, first fixing a formatting issue, then unpacking the JSON into nested lists with purr::map, and then converting the nested lists into a dataframe:

eg$data_clean <- gsub("'", "\"", eg$data)
eg2 <- purrr::map(eg$data_clean, jsonlite::fromJSON)
eg3 <- rbindlist(eg2, fill = TRUE)

the problem with the final dataframe (eg3) is that I loose the user_id, and can't associate values to users. In reality the JSON is much more complex and not every user ends up with the same number of rows. Does anyone know how to convert my JSON into a dataframe, whilst preserving the user_id for the data? Many thanks.

CodePudding user response:

eg %>%
  group_by(user_id) %>%
  mutate(data = list(jsonlite::fromJSON(chartr("'", '"', data))))%>%
  unnest_wider(data)%>%
  unnest_wider(value)

  user_id key     max   min
  <chr>   <chr> <int> <int>
1 10001   age      40    31
2 10002   age      30    21

CodePudding user response:

You could do:

eg$data_clean <- gsub("'", "\"", eg$data)

purrr::map(eg$data_clean, jsonlite::fromJSON) %>%
purrr::map2(eg$user_id, 
            ~ tibble(user_id = .y, min = .x$value$min, max = .x$value$max)) %>%
data.table::rbindlist()
#>    user_id min max
#> 1:   10001  31  40
#> 2:   10002  21  30

CodePudding user response:

Another possible solution:

library(tidyverse)

eg %>% 
  mutate(map_dfr(data, ~ str_extract_all(.x, "\\d ") %>%
          map(~ set_names(.x, c("max", "min")))), data = NULL)

#> # A tibble: 2 × 3
#>   user_id max   min  
#>   <chr>   <chr> <chr>
#> 1 10001   40    31   
#> 2 10002   30    21

Yet another possible solution:

library(tidyverse)

eg %>%
  rowwise %>%
  mutate(aux = str_extract_all(data, "\\d ") %>% unlist %>% str_c(collapse = ","), data=NULL) %>% 
  separate(aux, into = c("max", "min"), sep=",", convert = T)

#> # A tibble: 2 × 3
#>   user_id   max   min
#>   <chr>   <int> <int>
#> 1 10001      40    31
#> 2 10002      30    21
  • Related