I have a large data.frame that looks like this. I want to group by data.frame based on tissue and for each tissue to create a list
library(tidyverse)
df <- tibble(tissue=c("A","A","B","B"), genes=c('CD79B','CD79A','CD19','CD180'))
df
#> # A tibble: 4 × 2
#> tissue genes
#> <chr> <chr>
#> 1 A CD79B
#> 2 A CD79A
#> 3 B CD19
#> 4 B CD180
Created on 2022-10-21 with reprex v2.0.2
I want my data to look like this
#> # A tibble: 2 × 1
#> tissue genes
#> <chr> <chr>
#> 1 A CD79B
#> 2 A CD79A
#>
#> [[2]]
#> # A tibble: 2 × 1
#> tissue genes
#> <chr> <chr>
#> 1 B CD19
#> 2 B CD180
What have I tried so far?
I have used group_map
but I am missing the tissue column!
library(tidyverse)
df <- tibble(tissue=c("A","A","B","B"), genes=c('CD79B','CD79A','CD19','CD180'))
df1 <- df |>
group_by(tissue) |>
group_map(~.)
df1
#> [[1]]
#> # A tibble: 2 × 1
#> genes
#> <chr>
#> 1 CD79B
#> 2 CD79A
#>
#> [[2]]
#> # A tibble: 2 × 1
#> genes
#> <chr>
#> 1 CD19
#> 2 CD180
Created on 2022-10-21 with reprex v2.0.2
Any help or guidance are appreciated
CodePudding user response:
Are you looking for split()
? The function splits your dataframe into a list based on a variable.
library(tidyverse)
# splits the dataframe by 'tissue'
df <- split(df, df$tissue)
Let me know if that works for you!
CodePudding user response:
Here are 3 ways to achieve the same result. Posting this answer since you are asking about a base R native pipe solution:
library(tidyverse)
df <- tibble(tissue=c("A","A","B","B"), genes=c('CD79B','CD79A','CD19','CD180'))
# base R, no pipe
split(df, df$tissue)
#> $A
#> # A tibble: 2 × 2
#> tissue genes
#> <chr> <chr>
#> 1 A CD79B
#> 2 A CD79A
#>
#> $B
#> # A tibble: 2 × 2
#> tissue genes
#> <chr> <chr>
#> 1 B CD19
#> 2 B CD180
# base R with pipe
df |> {\(.) split(., .$tissue)}()
# or
df |> (\(.) split(., .$tissue))()
#> $A
#> # A tibble: 2 × 2
#> tissue genes
#> <chr> <chr>
#> 1 A CD79B
#> 2 A CD79A
#>
#> $B
#> # A tibble: 2 × 2
#> tissue genes
#> <chr> <chr>
#> 1 B CD19
#> 2 B CD180
# dplyr
df %>%
group_split(tissue)
#> <list_of<
#> tbl_df<
#> tissue: character
#> genes : character
#> >
#> >[2]>
#> [[1]]
#> # A tibble: 2 × 2
#> tissue genes
#> <chr> <chr>
#> 1 A CD79B
#> 2 A CD79A
#>
#> [[2]]
#> # A tibble: 2 × 2
#> tissue genes
#> <chr> <chr>
#> 1 B CD19
#> 2 B CD180