So I have a foreach
loop that assigns several different calculations inside the loop to separate variables. While running under a normal for loop
, all variables are accessible, but when switching to a foreach
loop, the function only returns the last 'stored' variable. How do I change this?
# assume the parallel processing is already set;
# won't affect the result
set.seed(1)
data <- dplyr::tibble(
x = rnorm(5),
y = rnorm(5, sd = 0.5),
z = rnorm(5, sd = 0.25)
)
sims = 10
df = foreach(sim = 1:sims) %dopar% {
calc <- data |>
furrr::future_pmap_dfr(
function(x, y, z) {
res <- (x y) / z
names(res) <- 'col'
return(res)
}
)
calc_2 <- data |>
furrr::future_pmap_dfr(
function(x, y, z) {
res <- (x y z)^2
names(res) <- 'col'
return(res)
}
)
}
df[[1]]
returns:
# A tibble: 5 × 1
col
<dbl>
1 0.434
2 0.275
3 0.387
4 1.77
5 0.210
When switching the orders of calc
and calc_2
, the function now returns:
# A tibble: 5 × 1
col
<dbl>
1 -2.74
2 4.38
3 3.00
4 -3.40
5 0.629
This loop is not making the other calculation accessible, and when I run the foreach
inside my real function -- which operates as intended using a for loop
-- the function fails because only one of the three variables assigned in the foreach
loop is stored and made accessible throughout the function call. Similar to a for loop
, how do I force the foreach
loop to assign all variables calculated inside the loop so that they can be carried throughout the function?
CodePudding user response:
You can return the list of results, and process accordingly.
df = foreach(sim = 1:sims) %dopar% {
calc <- data |>
furrr::future_pmap_dfr(
function(x, y, z) {
res <- (x y) / z
names(res) <- 'col'
return(res)
}
)
calc_2 <- data |>
furrr::future_pmap_dfr(
function(x, y, z) {
res <- (x y z)^2
names(res) <- 'col'
return(res)
}
)
list(calc, calc_2)
}
df
is now a list of length sims
, where each element is a list of length 2, containing calc
and calc_2