How to reshape list with sublist?


> mylist
  truth model1 model2
1     1      2    1.0
2     2      3   -0.5
3     3     -1    4.0

  truth model1 model2
1     1      1      2
2     2      4      2
3     3      4      1

I have a list that contains a number of sublists. In the example above, it's 2, but the number of sublists can be more than 2.

In each sublist, there's a data.frame that contains the truth and predictions from model1 and model2. I want to reshape my list so that each sublist corresponds to a specific model, i.e., I would like:

  truth result.1 result.2
1     1        2        1
2     2        3        4
3     3       -1        4

  truth result.1 result.2
1     1      1.0        2
2     2     -0.5        2
3     3      4.0        1

Is there a quick way to reshape the list this way?

CodePudding user response:

Using cbind in a do.call.

lapply(1:length(L), \(i) do.call(cbind, c(L$result.1[, 1, F], lapply(L, `[[`, i)))) |>
# $model1
#      truth result.1 result.2
# [1,]     1        2        1
# [2,]     2        3        4
# [3,]     3       -1        4
# $model2
#      truth result.1 result.2
# [1,]     1      1.0        2
# [2,]     2     -0.5        2
# [3,]     3      4.0        1

Note: R >= 4.1


L <- list(result.1 = structure(list(truth = 1:3, model1 = c(2, 3, 
-1), model2 = c(1, -0.5, 4)), class = "data.frame", row.names = c(NA, 
-3L)), result.2 = structure(list(truth = 1:3, model1 = c(1, 4, 
4), model2 = c(2, 2, 1)), class = "data.frame", row.names = c(NA, 

CodePudding user response:

Consider iterating across distinct model column names with a chain merge:

newlist <- sapply(
  function(nm) {
    df <- Reduce(
      function(x, y) merge(x, y, by="truth"), 
      lapply(mylist, `[`, c("truth", nm))
    df <- setNames(df, c("truth", paste0("result.", 1:(ncol(df)-1))))
  simplify = FALSE

  truth result.1 result.2
1     1        2        1
2     2        3        4
3     3       -1        4

  truth result.1 result.2
1     1      1.0        2
2     2     -0.5        2
3     3      4.0        1

CodePudding user response:

Here's a tidyverse option: if you bind all the data frames into one and use the list's names to mark off which model the data comes from, it becomes a simple transpose operation. Then you can split again by model.

I added an additional model to test how it scales: you don't hard-code the number of trials or models or their names, and if a model is missing for one trial, you'll have NAs but no errors.


mylist %>%
  bind_rows(.id = "trial") %>%
  tidyr::pivot_longer(matches("model\\d "), names_to = "model") %>%
  tidyr::pivot_wider(names_from = trial) %>%
  split(.$model) %>%
  purrr::map(select, -model)
#> $model1
#> # A tibble: 3 × 3
#>   truth result.1 result.2
#>   <int>    <dbl>    <dbl>
#> 1     1        2        1
#> 2     2        3        4
#> 3     3       -1        4
#> $model2
#> # A tibble: 3 × 3
#>   truth result.1 result.2
#>   <int>    <dbl>    <dbl>
#> 1     1      1          2
#> 2     2     -0.5        2
#> 3     3      4          1
#> $model3
#> # A tibble: 3 × 3
#>   truth result.1 result.2
#>   <int>    <dbl>    <dbl>
#> 1     1        0        9
#> 2     2        4        5
#> 3     3        2        2

Data from jay.sf's answer plus another dummy column

mylist <- list(result.1 = structure(list(truth = 1:3, model1 = c(2, 3, -1), model2 = c(1, -0.5, 4), model3 = c(0, 4, 2)), class = "data.frame", row.names = c(NA, -3L)), result.2 = structure(list(truth = 1:3, model1 = c(1, 4, 4), model2 = c(2, 2, 1), model3 = c(9, 5, 2)), class = "data.frame", row.names = c(NA, -3L)))
