Home > Mobile >  How to unnest a data frame containing list of list with varied length?
How to unnest a data frame containing list of list with varied length?

Time:06-26

I was trying to unnest the the following data frame.

df.org <- structure(list(Gene = "ARIH1", Description = "E3 ubiquitin-protein ligase ARIH1", 
    condition2_cellline = list(c("MCF7", "Jurkat")), condition2_activity = list(
        c(40.8284023668639, 13.26973)), condition2_concentration = list(
        c("100uM", "100uM")), condition3_cellline = list("Jurkat"), 
    condition3_activity = list(-4.60251), condition3_concentration = list(
        "100uM")), row.names = c(NA, -1L), class = c("tbl_df", 
"tbl", "data.frame"))

This is my code:

df.output <- df.ori %>% 
  unnest(where(is.list), keep_empty = T)

This is what I got:

structure(list(Gene = c("ARIH1", "ARIH1"), Description = c("E3 ubiquitin-protein ligase ARIH1", 
"E3 ubiquitin-protein ligase ARIH1"), condition2_cellline = c("MCF7", 
"Jurkat"), condition2_activity = c(40.8284023668639, 13.26973
), condition2_concentration = c("100uM", "100uM"), condition3_cellline = c("Jurkat", 
"Jurkat"), condition3_activity = c(-4.60251, -4.60251), condition3_concentration = c("100uM", 
"100uM")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-2L))

Is there a way to avoid duplicating those variables with a shorter length? The following output is what I want to get.

df.desired <- structure(list(Gene = c("ARIH1", "ARIH1"), Description = c("E3 ubiquitin-protein ligase ARIH1", 
"E3 ubiquitin-protein ligase ARIH1"), condition2_cellline = c("MCF7", 
"Jurkat"), condition2_activity = c(40.8284023668639, 13.26973
), condition2_concentration = c("100uM", "100uM"), condition3_cellline = c(NA, 
"Jurkat"), condition3_activity = c(NA, -4.60251), condition3_concentration = c(NA, 
"100uM")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-2L))

Thanks so much for any help!

CodePudding user response:

We could also do without reshaping i.e. get the max of the list column lengths in a column, then loop across those list columns, modify the length with the max value and use unnest

library(dplyr)
library(purrr)
library(tidyr)
df.org %>% 
  mutate(l1 = max(across(where(is.list), lengths)),
   across(where(is.list), ~ map(.x, `length<-`, l1)), l1 = NULL) %>% 
   unnest(where(is.list), keep_empty = TRUE)

-output

# A tibble: 2 × 8
  Gene  Description                       condition2_cellline condition2_activity condition2_concentration condition3_cellline condition3_activity condition3_concentration
  <chr> <chr>                             <chr>                             <dbl> <chr>                    <chr>                             <dbl> <chr>                   
1 ARIH1 E3 ubiquitin-protein ligase ARIH1 MCF7                               40.8 100uM                    Jurkat                            -4.60 100uM                   
2 ARIH1 E3 ubiquitin-protein ligase ARIH1 Jurkat                             13.3 100uM                    <NA>                              NA    <NA>                  

CodePudding user response:

Here is suggestion how it could work.

  1. We pivot_longer all listed columns.
  2. apply the the function to create lists of same length
  3. pivot back and unnest.
library(dplyr)
library(tidyr)

df.org %>% 
  pivot_longer(cols = starts_with("condition")) %>% 
  mutate(value = lapply(value, `length<-`, max(lengths(value)))) %>% 
  pivot_wider(names_from = name, values_from = value) %>% 
  unnest(cols = c(condition2_cellline, condition2_activity, condition2_concentration, 
                  condition3_cellline, condition3_activity, condition3_concentration)) 
Gene  Description        condition2_cell~ condition2_acti~ condition2_conc~ condition3_cell~ condition3_acti~ condition3_conc~
  <chr> <chr>              <chr>                       <dbl> <chr>            <chr>                       <dbl> <chr>           
1 ARIH1 E3 ubiquitin-prot~ MCF7                         40.8 100uM            Jurkat                      -4.60 100uM           
2 ARIH1 E3 ubiquitin-prot~ Jurkat                       13.3 100uM            NA                          NA    NA              
> 
  • Related