I have a large nested list. Its' structure is similar to the provided dummy data. I would like to loop through this list and apply a certain function only to the elements (vectors) named "seq" while ignoring the other categories. The elements of interest shall be subjected to the another function - a "function_of_interest" in the pseudocode below and the output shall be appended to the new list. Due to the size of my data, I would like to do it in parallel.
Here is the dummy input:
x <- list(one = list(one_1 = list(seq = 1:9, start = 1, end = 5),
one_2 = list(seq = 2:11, start = 2, end = 6), one_3 = list(
seq = 3:12, start = 3, end = 7)), two = list(two_1 = list(
seq = 1:13, start = 8, end = 222), two_2 = list(seq = 1:14,
start = 13, end = 54)))
And here is one of my attempts which failed:
#loop through the nested list
for (gene in seq_along(genes_list)){
for (segment in seq_along(genes_list[[gene]])){
output_list <- c(output_list, foreach::foreach(segment) %dopar% function_of_interest(genes_list[[gene]][[segment]]))
}
}
Would be glad for help/guidance.
CodePudding user response:
How about this solution which calls lapply()
on each element of x
in parallel, and applies function_of_interest()
to each element of the second-level nested list called seq
. Note: this requires that each of those lists does in fact have an element named seq
. You would need to add additional code to test whether this is the case for each list, if there is a possibility that some of them do not have an element called seq
.
I defined a function_of_interest()
to test your code.
function_of_interest <- function(vec) sum(vec)
output_list <- foreach(i = seq_along(x)) %dopar% {
lapply(x[[i]], function(x_ij) function_of_interest(x_ij[['seq']]))
}
outputs:
[[1]]
[[1]]$one_1
[1] 45
[[1]]$one_2
[1] 65
[[1]]$one_3
[1] 75
[[2]]
[[2]]$two_1
[1] 91
[[2]]$two_2
[1] 105