Split list of dataframes and return separate list entries instead of sublists-CodePudding

I have a list of dataframes that I would like to split based on a column, in that case the cluster column.

d1 <- data.frame(y1=c(1,2,3), cluster=c(1,2,6))
d2 <- data.frame(y1=c(3,2,1), cluster=c(6,2,4))

my.list <- list(d1, d2)

Using lapply(my.list , function(x) split(x, x$cluster)) returns the splitted dataframes as sublists. Is it possible to split the dataframes and create new dataframes as separate list entries?

The desired output would be something like this:

my.list2 <- list(df1_cl1 , df1_cl2m df1_cl6, df2_cl6, df2_cl2, df2_cl4 )

Thank you!

CodePudding user response：

The first step is correct, to get data in required structure you can unlist the list output with recursive = FALSE.

my.list2  <- unlist(lapply(my.list , function(x) 
                    split(x, x$cluster)), recursive = FALSE)

my.list2
#$`1`
#  y1 cluster
#1  1       1

#$`2`
#  y1 cluster
#2  2       2

#$`6`
#  y1 cluster
#3  3       6

#$`2`
#  y1 cluster
#2  2       2

#$`4`
#  y1 cluster
#3  1       4

#$`6`
#  y1 cluster
#1  3       6

length(my.list2)
#[1] 6

You can drop the names of the list with unname(my.list2).

CodePudding user response：

Another possible solution, based on dplyr:group_split and purrr::map:

library(tidyverse)

map (my.list, ~ group_split(.x, .x$cluster, .keep = F)) %>% flatten

#> [[1]]
#> # A tibble: 1 × 2
#>      y1 cluster
#>   <dbl>   <dbl>
#> 1     1       1
#> 
#> [[2]]
#> # A tibble: 1 × 2
#>      y1 cluster
#>   <dbl>   <dbl>
#> 1     2       2
#> 
#> [[3]]
#> # A tibble: 1 × 2
#>      y1 cluster
#>   <dbl>   <dbl>
#> 1     3       6
#> 
#> [[4]]
#> # A tibble: 1 × 2
#>      y1 cluster
#>   <dbl>   <dbl>
#> 1     2       2
#> 
#> [[5]]
#> # A tibble: 1 × 2
#>      y1 cluster
#>   <dbl>   <dbl>
#> 1     1       4
#> 
#> [[6]]
#> # A tibble: 1 × 2
#>      y1 cluster
#>   <dbl>   <dbl>
#> 1     3       6