Home > front end >  Generating a three dimensional data structure with purrr map
Generating a three dimensional data structure with purrr map

Time:09-08

I have a list that contains 1000 simulations, each has a sample of dimension 1 x 100:

library(purrr)
simulations <- rerun(1000, rbinom(100, 100, 0.8)) %>% set_names(paste0("sim", 1:1000))

I have another list of numbers of length 500 - or 500 steps:

# Each entry is a step from 1 to 500
foo <- rnorm(500, 200,75) %>% set_names(paste0("step", 1:500))

For each vector in simulations I want multiply it by foo. If we take the first sample vector from simulations we get:

values <- simulations$sim1 %*% t(foo)

Where values is of dimension 100 x 500. I'd like to generate a data structure that is flattened along the three dimensions sim, step and values:

sim step values
1 step1 v1
.. .. ..
1 step500 v500
1 .. ..
1 step1 v501
1 .. ..
1 step500 v1000
1 .. ..
1 step1 v50000
2 step1 v1
.. .. ..
.. .. ..
1000 step500 v50000

So ultimately I should end up with a data structure that has 1000 x 500 x 100 = 50m rows. If possible I'd like to use an approach using purrr::map.

CodePudding user response:

I think it might be

library(tidyverse)
  
imap_dfr(
  simulations,
  ~ data.frame(.x %*% t(foo)) |>
    pivot_longer(cols = everything()) |>
    mutate(sim = as.integer(str_replace(.y, "sim", ""))) |>
    relocate(sim)
)

CodePudding user response:

In base R, we may loop over the list with lapply, do the computation with foo, stack to a two column data.frame and then create the sim column by cbinding with the sequence of the list and rbind

out <- do.call(rbind, Map(cbind, sim = seq_along(simulations), 
     lapply(simulations, \(x) stack(as.data.frame(x %*% t(foo))))))

CodePudding user response:

With purr you can do this way:

# vector multiplication of each simulation with foo
map(simulations, `%*%`, t(foo)) %>% 

  # set result to dataframe and row-bind them
  map_dfr(as.data.frame, .id = "sim") %>% 

  # reshape to get the result you need
  tidyr::pivot_longer(-sim, names_to = "step")

#> # A tibble: 50'000'000 x 3
#>    sim   step    value
#>    <chr> <chr>   <dbl>
#>  1 sim1  step1  22321.
#>  2 sim1  step2  21469.
#>  3 sim1  step3  17064.
#>  4 sim1  step4   4216.
#>  5 sim1  step5  20458.
#>  6 sim1  step6  16251.
#>  7 sim1  step7  15630.
#>  8 sim1  step8   7445.
#>  9 sim1  step9  13624.
#> 10 sim1  step10 19202.
#> # ... with 49,999,990 more rows
  • Related