Home > other >  Vectorization to extract and bind very nested data
Vectorization to extract and bind very nested data

Time:10-21

I have some very nested data. Within my list-column-dataframes, there are some pieces I need to put together and I've done so in a single instance to get my desired dataframe:

a <- df[[2]][["result"]]@data
b <- df[[2]][["result"]]@coords

desired_df <- cbind(a, b)

My original Large list has 171 elements, meaning I have 1:171 (3.3 GB) to go inside those square brackets and would ideally end up with 171 desired dataframes (which I would then bind all together).

I haven't needed to write a loop in 10 years, but I don't see a tidyverse way to deal with this. I also no longer know how to write loops. There are definitely some elements in there that are junk and will fail.

CodePudding user response:

If I understand your data structure, which I probably don't, you could do:

library(tidyverse)

# Create dummy data
df <- mtcars
df$mpg <- list(result = I(list('test')))
df$mpg$result <- list("@data" = I(list('your data')))
df <- df %>% select(mpg, cyl)
df1 <- df
df2 <- df

# Pull data you're interested in. 
# The index is 1 here, instead of 2, because it's fake data and not your data.
# Assuming the @ is not unique, and is just parsed from JSON or some other format.
dont_at_me <- function(x){
  a <- x[[1]][["result"]][["@data"]]
  a
}

# Get a list of all of your data.frames
all_dfs <- Filter(function(x) is(x, "data.frame"), mget(ls()))

# Vectorize
purrr::map(all_dfs, ~dont_at_me(.))

CodePudding user response:

You haven't provided any sort of minimal example of the data.

I've condensed it to mean something like this

methods::setClass(
  "weird_object",
  slots = c(data = "data.frame", coords = "matrix")
)


df <- list(
  list(
    result = new("weird_object")
  ),list(
    result = new("weird_object")
  ),list(
    result = new("weird_object")
  ),list(
    result = new("weird_object")
  )
) 

And if I had such a list with these objects, then I could do

df %>% 
  map(. %>% {
    list(data = .$result@data,
         cooords = .$result@coords)
  }) %>% 
  enframe() %>% 
  unnest_wider(value)

But the selecting / hoisting function might fail, thus one can wrap it in a purrr::possibly, and choose a reasonable default:

df %>% 
  map(possibly(. %>% {
    list(data = .$result@data,
         cooords = .$result@coords)
  }, 
    otherwise = list(data = NA, coords = NA))) %>% 
  enframe() %>% 
  unnest_wider(value)

Hopefully, this could be a step forward.

  • Related