I have a list of dataframes that I would like to split based on a column, in that case the
cluster
column.
d1 <- data.frame(y1=c(1,2,3), cluster=c(1,2,6))
d2 <- data.frame(y1=c(3,2,1), cluster=c(6,2,4))
my.list <- list(d1, d2)
Using
lapply(my.list , function(x) split(x, x$cluster))
returns the splitted dataframes as sublists. Is it possible to split the dataframes and create new dataframes as separate list entries?
The desired output would be something like this:
my.list2 <- list(df1_cl1 , df1_cl2m df1_cl6, df2_cl6, df2_cl2, df2_cl4 )
Thank you!
CodePudding user response:
The first step is correct, to get data in required structure you can unlist
the list output with recursive = FALSE
.
my.list2 <- unlist(lapply(my.list , function(x)
split(x, x$cluster)), recursive = FALSE)
my.list2
#$`1`
# y1 cluster
#1 1 1
#$`2`
# y1 cluster
#2 2 2
#$`6`
# y1 cluster
#3 3 6
#$`2`
# y1 cluster
#2 2 2
#$`4`
# y1 cluster
#3 1 4
#$`6`
# y1 cluster
#1 3 6
length(my.list2)
#[1] 6
You can drop the names of the list with unname(my.list2)
.
CodePudding user response:
Another possible solution, based on dplyr:group_split
and purrr::map
:
library(tidyverse)
map (my.list, ~ group_split(.x, .x$cluster, .keep = F)) %>% flatten
#> [[1]]
#> # A tibble: 1 × 2
#> y1 cluster
#> <dbl> <dbl>
#> 1 1 1
#>
#> [[2]]
#> # A tibble: 1 × 2
#> y1 cluster
#> <dbl> <dbl>
#> 1 2 2
#>
#> [[3]]
#> # A tibble: 1 × 2
#> y1 cluster
#> <dbl> <dbl>
#> 1 3 6
#>
#> [[4]]
#> # A tibble: 1 × 2
#> y1 cluster
#> <dbl> <dbl>
#> 1 2 2
#>
#> [[5]]
#> # A tibble: 1 × 2
#> y1 cluster
#> <dbl> <dbl>
#> 1 1 4
#>
#> [[6]]
#> # A tibble: 1 × 2
#> y1 cluster
#> <dbl> <dbl>
#> 1 3 6