Home > database >  Convert a grouped data.frame into a list in R dplyr
Convert a grouped data.frame into a list in R dplyr

Time:10-22

I have a large data.frame that looks like this. I want to group by data.frame based on tissue and for each tissue to create a list

library(tidyverse)
df <- tibble(tissue=c("A","A","B","B"), genes=c('CD79B','CD79A','CD19','CD180'))
df
#> # A tibble: 4 × 2
#>   tissue genes
#>   <chr>  <chr>
#> 1   A     CD79B
#> 2   A     CD79A
#> 3   B     CD19 
#> 4   B     CD180

Created on 2022-10-21 with reprex v2.0.2

I want my data to look like this

#> # A tibble: 2 × 1
#>  tissue       genes
#>   <chr>       <chr>
#> 1   A         CD79B
#> 2   A         CD79A
#> 
#> [[2]]
#> # A tibble: 2 × 1
#>   tissue genes
#>   <chr>    <chr>
#> 1   B    CD19 
#> 2   B    CD180

What have I tried so far? I have used group_map but I am missing the tissue column!

library(tidyverse)
df <- tibble(tissue=c("A","A","B","B"), genes=c('CD79B','CD79A','CD19','CD180'))
  

df1 <- df |> 
  group_by(tissue) |> 
  group_map(~.)

df1  
#> [[1]]
#> # A tibble: 2 × 1
#>   genes
#>   <chr>
#> 1 CD79B
#> 2 CD79A
#> 
#> [[2]]
#> # A tibble: 2 × 1
#>   genes
#>   <chr>
#> 1 CD19 
#> 2 CD180

Created on 2022-10-21 with reprex v2.0.2

Any help or guidance are appreciated

CodePudding user response:

Are you looking for split()? The function splits your dataframe into a list based on a variable.

library(tidyverse)

# splits the dataframe by 'tissue'
df <- split(df, df$tissue)

Let me know if that works for you!

CodePudding user response:

Here are 3 ways to achieve the same result. Posting this answer since you are asking about a base R native pipe solution:

library(tidyverse)
df <- tibble(tissue=c("A","A","B","B"), genes=c('CD79B','CD79A','CD19','CD180'))

# base R, no pipe
split(df, df$tissue)
#> $A
#> # A tibble: 2 × 2
#>   tissue genes
#>   <chr>  <chr>
#> 1 A      CD79B
#> 2 A      CD79A
#> 
#> $B
#> # A tibble: 2 × 2
#>   tissue genes
#>   <chr>  <chr>
#> 1 B      CD19 
#> 2 B      CD180

# base R with pipe
df |> {\(.) split(., .$tissue)}()

# or
df |> (\(.) split(., .$tissue))()

#> $A
#> # A tibble: 2 × 2
#>   tissue genes
#>   <chr>  <chr>
#> 1 A      CD79B
#> 2 A      CD79A
#> 
#> $B
#> # A tibble: 2 × 2
#>   tissue genes
#>   <chr>  <chr>
#> 1 B      CD19 
#> 2 B      CD180

# dplyr
df %>% 
  group_split(tissue)
#> <list_of<
#>   tbl_df<
#>     tissue: character
#>     genes : character
#>   >
#> >[2]>
#> [[1]]
#> # A tibble: 2 × 2
#>   tissue genes
#>   <chr>  <chr>
#> 1 A      CD79B
#> 2 A      CD79A
#> 
#> [[2]]
#> # A tibble: 2 × 2
#>   tissue genes
#>   <chr>  <chr>
#> 1 B      CD19 
#> 2 B      CD180
  • Related