Suppose I have a dataset dt
like this:
meta_cat | cat | sku | price | sales |
---|---|---|---|---|
bakery | bread | 796590 | 22.6 | 24 |
bakery | bread | 796595 | 19.8 | 20 |
bakery | doughnut | 796588 | 30.6 | 36 |
bakery | sandwich | 796640 | 45.9 | 42 |
bakery | sandwich | 796643 | 43.3 | 45 |
fruits | feijoa | 645342 | 97.2 | 5 |
fruits | orange | 645675 | 35.7 | 78 |
fruits | orange | 645677 | 43.9 | 65 |
fruits | feijoa | 645342 | 92.9 | 11 |
Also, I have a list which looks like this, for example:
lvl_list <- list(c("meta_cat"),
c("cat"))
I don’t know in advance how many levels there will be in the list (list length can be either 0 (empty list), or one, two, three, etc. (in our example, there are two levels)). List values correspond to the columns names from the dataset.
My task is to run the nested for loops based on the length of the list.
If the list is empty, the loop does not start and the main code
is executed.
If the list length = 1, there should be 1 for loop like this:
for(i in unique(dt[[lvl_list[[1]]]])){
dt <- dt[get(lvl_list[[1]]) == I,] # make subset
# run main code
# .
# .
# main code
}
}
So, at the first iteration, we filter the dt
by the first unique value of the meta_cat
column (for example, choose only records where meta_cat = "bakery"
) and run main code
on this dt
.
If the length of the list = 2, we should get 2 for loops:
for(i in unique(dt[[lvl_list[[1]]]])){
dt <- dt[get(lvl_list[[1]]) == i, ] # filter dt
for(j in unique(dt[[lvl_list[[2]]]])){
dt <- dt[get(lvl_list[[2]]) == j, ] # filter dt again
# run main code
# .
# .
# main code
}
}
So, here we filter dt
by values of two columns.
There are two unique values for variable meta_cat
and 5 unique values for cat
variable.
The logic of code execution should be as follows: at the first iteration, we filter the dt
by the first value of meta_cat
(leaving in dt
observations, where meta_cat = "bakery"
), at the first iteration of the second loop, we filter the dt
by the first value of cat
variable (we will choose observations where cat = "bread"
). So, we obtain dt
where meta_cat = "bakery"
and cat = "bread"
. Further, this filtered dt
is used as an input for the modelling code.
On the second iteration, the original dt
is filtered by meta_cat = "bakery"
, and cat = "doughnut"
. Then the main code is executed for this dt
, end so on.
If there are 3 levels in the list, we should have 3 for loops, etc.
My question: is it possible to create nested for loops dynamically, based on the list length?
I would be grateful for any help how it can be implemented.
CodePudding user response:
It may be easier with split
lst1 <- lapply(split(dt, dt[[lvl_list[[1]]]]), function(x)
split(x, x[[lvl_list[[2]]]]))
Also, as this is a recursive split, use rsplit
from collapse
, which by default does recursive split and returns the nested list`
library(collapse)
lst2 <- rsplit(dt, by = dt[, unlist(lvl_list), with = FALSE])
data
dt <- structure(list(meta_cat = c("bakery", "bakery", "bakery", "bakery",
"bakery", "fruits", "fruits", "fruits", "fruits"), cat = c("bread",
"bread", "doughnut", "sandwich", "sandwich", "feijoa", "orange",
"orange", "feijoa"), sku = c(796590L, 796595L, 796588L, 796640L,
796643L, 645342L, 645675L, 645677L, 645342L), price = c(22.6,
19.8, 30.6, 45.9, 43.3, 97.2, 35.7, 43.9, 92.9), sales = c(24L,
20L, 36L, 42L, 45L, 5L, 78L, 65L, 11L)), row.names = c(NA, -9L
), class = c("data.table", "data.frame"))