Home > Mobile >  Base R: convert nested list with different names to data.frame filling NA and adding column
Base R: convert nested list with different names to data.frame filling NA and adding column

Time:05-11

I need a base R solution to convert nested list with different names to a data.frame

mylist <- list(list(a=1,b=2), list(a=3), list(b=5), list(a=9, z=list('k'))

convert(mylist)
## returns a data.frame:
##
##     a     b    z           
##     1     2    <NULL>   
##     3    NA    <NULL>   
##    NA     5    <NULL>   
##     9    NA    <chr [1]>

I know this could be easily done with dplyr::bind_rows, but I need a solution in base R. To simplify the problem, it is also fine with a 2-level nested list that has no 3rd level lists such as

mylist <- list(list(a=1,b=2), list(a=3), list(b=5), list(a=9, z='k'))

convert(mylist)
## returns a data.frame:
##
##     a     b    z           
##     1     2    NA   
##     3    NA    NA   
##    NA     5    NA   
##     9    NA    k  

I have tried something like

convert <- function(L) as.data.frame(do.call(rbind, L))

This does not fill NA and add additional column z

CodePudding user response:

You can do something like the following:

mylist <- list(list(a=1,b=2), list(a=3), list(b=5), list(a=9, z='k'))

convert <- function(mylist){
  col_names <- NULL
  # get all the unique names and create the df
  for(i in 1:length(mylist)){
    col_names <- c(col_names, names(mylist[[i]]))
  }
  col_names <- unique(col_names)
  df <- data.frame(matrix(ncol=length(col_names),
                          nrow=length(mylist)))
  colnames(df) <- col_names
  
  # join data to row in df
  for(i in 1:length(mylist)){
    for(j in 1:length(mylist[[i]])){
      df[i, names(mylist[[i]])[j]] <- mylist[[i]][names(mylist[[i]])[j]]
    }
  }
  return(df)
}

df <- convert(mylist)
> df
   a  b    z
1  1  2 <NA>
2  3 NA <NA>
3 NA  5 <NA>
4  9 NA    k

CodePudding user response:

I've got a solution. Note this only uses the pipe, and could be exchanged for native pipe, etc.

mylist %>% 
  #' first, ensure that the 2nd level is flat,
  lapply(. %>% lapply(FUN = unlist, recursive = FALSE)) %>%
  #' replace missing vars with `NA`
  lapply(function(x, vars) {
    x[vars[!vars %in% names(x)]]<-NA
    x
  }, vars = {.} %>% unlist() %>% names() %>% unique()) %>%
  do.call(what = rbind) %>%
  #' do nothing
  identity()

CodePudding user response:

A shorter solution in base R would be

make_df <- function(a = NA, b = NA, z = NA) {
  data.frame(a = unlist(a), b = unlist(b), z = unlist(z))
}

do.call(rbind, lapply(mylist, function(x) do.call(make_df, x)))
#>    a  b    z
#> 1  1  2 <NA>
#> 2  3 NA <NA>
#> 3 NA  5 <NA>
#> 4  9 NA    k

Created on 2022-05-10 by the reprex package (v2.0.1)

  • Related