How to easily rbind nested list with different depths?-CodePudding

I have a dataset where some of the data is in a nested list. The problem is that the depth of the lists varies, and it is impossible to know in advance the depths.

Here is an example (note: in the original dataset, none of the list are named):

list1 <- list(
  list("data1"), 
  list(
    list("data2")
  ), 
  list(c("data3", "data4", "data5")),
  list(
    list(
      "data6"
    )
  ), 
  list(c("data7", "data8")), 
  list(
    list(
      list(c("data9", "data10", "data11", "data12"))
    )
  ),
  list("data13")
)

My goal is to extract all the "data..." from this list into a new dataframe (extract in a sense that I want to rbind them):

nrows	data
1	data1
2	data2
3	data3, data4, data5
4	data6
5	data7, data8
6	data9, data10, data11, data12
7	data13

I tried many options (repeated unlist/map/lapply(list1, function(l) l[[1]]) then bind_rows or do.call(rbind, ...), etc..), but none of them worked. The thing that bothers me the most is that I don't know why I can't solve it.

So my question is the following: what is the easiest way to rbind nested lists with different depths?

CodePudding user response：

lapply() a recursive unlist and then just pass the resulting list to tibble() as a convenient way to get it into a data frame (a tibble is a type of data frame).

list1 <- list(
  list("data1"), 
  list(
    list("data2")
  ), 
  list(c("data3", "data4", "data5")),
  list(
    list(
      "data6"
    )
  ), 
  list(c("data7", "data8")), 
  list(
    list(
      list(c("data9", "data10", "data11", "data12"))
    )
  ),
  list("data13")
)

## consider piping to as.data.frame if you specifically 
## want that format
new_list <- lapply(list1, unlist, recursive = TRUE)

dplyr::tibble(data = new_list)
#> # A tibble: 7 × 1
#>   data     
#>   <list>   
#> 1 <chr [1]>
#> 2 <chr [1]>
#> 3 <chr [3]>
#> 4 <chr [1]>
#> 5 <chr [2]>
#> 6 <chr [4]>
#> 7 <chr [1]>

^{Created on 2022-12-14 with reprex v2.0.2}

CodePudding user response：

unlist() each sublist and the rest will be easy. To get the data as a character vector you can then apply toString():

data.frame(
  nr = seq_along(list1),
  data = sapply(list1, \(x) toString(unlist(x)))
)


#   nr                          data
# 1  1                         data1
# 2  2                         data2
# 3  3           data3, data4, data5
# 4  4                         data6
# 5  5                  data7, data8
# 6  6 data9, data10, data11, data12
# 7  7                        data13

CodePudding user response：

1) Assuming you want a column whose elements are character vectors try this:

DF <- data.frame(nrows = seq_along(list1))
DF$data <- lapply(list1, unlist)

str(DF)
## 'data.frame':   7 obs. of  2 variables:
##  $ nrows: int  1 2 3 4 5 6 7
##  $ data :List of 7
##   ..$ : chr "data1"
##   ..$ : chr "data2"
##   ..$ : chr  "data3" "data4" "data5"
##   ..$ : chr "data6"
##   ..$ : chr  "data7" "data8"
##   ..$ : chr  "data9" "data10" "data11" "data12"
##   ..$ : chr "data13"

1a) This variation also works. Using I(,,,) is critical here as without it the code would issue an error.

DF <- data.frame(nrows = seq_along(list1), data = I(lapply(list1, unlist)))
str(DF)
## 'data.frame':   7 obs. of  2 variables:
##  $ nrows: int  1 2 3 4 5 6 7
##  $ data :List of 7
##   ..$ : chr "data1"
##   ..$ : chr "data2"
##   ..$ : chr  "data3" "data4" "data5"
##   ..$ : chr "data6"
##   ..$ : chr  "data7" "data8"
##   ..$ : chr  "data9" "data10" "data11" "data12"
##   ..$ : chr "data13"
##   ..- attr(*, "class")= chr "AsIs"

2) If, instead, you want a character column in which each element of the data column is a single comma-space separated string or comma-separated string then after running (1) above run one of these:

DF$data <- sapply(DF$data, toString)
str(DF)
## 'data.frame':   7 obs. of  2 variables:
##  $ nrows: int  1 2 3 4 5 6 7
##  $ data : chr  "data1" "data2" "data3, data4, data5" "data6" ...

DF$data <- sapply(DF$data, paste, collapse = ",")
str(DF)
## 'data.frame':   7 obs. of  2 variables:
##  $ nrows: int  1 2 3 4 5 6 7
##  $ data : chr  "data1" "data2" "data3,data4,data5" "data6" ...

CodePudding user response：

There's a very nice function in the collapse package for this, unlist2d

Output from unlist2d(list1) is

       .id.1 .id.2 .id.3 .id.4   V1     V2     V3     V4
    1     1     1    NA    NA  data1   <NA>   <NA>   <NA>
    2     2     1     1    NA  data2   <NA>   <NA>   <NA>
    3     3     1    NA    NA  data3  data4  data5   <NA>
    4     4     1     1    NA  data6   <NA>   <NA>   <NA>
    5     5     1    NA    NA  data7  data8   <NA>   <NA>
    6     6     1     1     1  data9 data10 data11 data12
    7     7     1    NA    NA data13   <NA>   <NA>   <NA>

There are options to work with the column and row names