I have a dataset where some of the data is in a nested list. The problem is that the depth of the lists varies, and it is impossible to know in advance the depths.
Here is an example (note: in the original dataset, none of the list are named):
`
list1 <- list(
list("data1"),
list(
list("data2")
),
list(c("data3", "data4", "data5")),
list(
list(
"data6"
)
),
list(c("data7", "data8")),
list(
list(
list(c("data9", "data10", "data11", "data12"))
)
),
list("data13")
)
`
My goal is to extract all the "data..." from this list into a new dataframe (extract in a sense that I want to rbind them):
nrows | data |
---|---|
1 | data1 |
2 | data2 |
3 | data3, data4, data5 |
4 | data6 |
5 | data7, data8 |
6 | data9, data10, data11, data12 |
7 | data13 |
I tried many options (repeated unlist/map/lapply(list1, function(l) l[[1]]) then bind_rows or do.call(rbind, ...), etc..), but none of them worked. The thing that bothers me the most is that I don't know why I can't solve it.
So my question is the following: what is the easiest way to rbind nested lists with different depths?
CodePudding user response:
lapply()
a recursive unlist and then just pass the resulting list to tibble()
as a convenient way to get it into a data frame (a tibble is a type of data frame).
list1 <- list(
list("data1"),
list(
list("data2")
),
list(c("data3", "data4", "data5")),
list(
list(
"data6"
)
),
list(c("data7", "data8")),
list(
list(
list(c("data9", "data10", "data11", "data12"))
)
),
list("data13")
)
## consider piping to as.data.frame if you specifically
## want that format
new_list <- lapply(list1, unlist, recursive = TRUE)
dplyr::tibble(data = new_list)
#> # A tibble: 7 × 1
#> data
#> <list>
#> 1 <chr [1]>
#> 2 <chr [1]>
#> 3 <chr [3]>
#> 4 <chr [1]>
#> 5 <chr [2]>
#> 6 <chr [4]>
#> 7 <chr [1]>
Created on 2022-12-14 with reprex v2.0.2
CodePudding user response:
unlist()
each sublist and the rest will be easy. To get the data as a character vector you can then apply toString()
:
data.frame(
nr = seq_along(list1),
data = sapply(list1, \(x) toString(unlist(x)))
)
# nr data
# 1 1 data1
# 2 2 data2
# 3 3 data3, data4, data5
# 4 4 data6
# 5 5 data7, data8
# 6 6 data9, data10, data11, data12
# 7 7 data13
CodePudding user response:
1) Assuming you want a column whose elements are character vectors try this:
DF <- data.frame(nrows = seq_along(list1))
DF$data <- lapply(list1, unlist)
str(DF)
## 'data.frame': 7 obs. of 2 variables:
## $ nrows: int 1 2 3 4 5 6 7
## $ data :List of 7
## ..$ : chr "data1"
## ..$ : chr "data2"
## ..$ : chr "data3" "data4" "data5"
## ..$ : chr "data6"
## ..$ : chr "data7" "data8"
## ..$ : chr "data9" "data10" "data11" "data12"
## ..$ : chr "data13"
1a) This variation also works. Using I(,,,) is critical here as without it the code would issue an error.
DF <- data.frame(nrows = seq_along(list1), data = I(lapply(list1, unlist)))
str(DF)
## 'data.frame': 7 obs. of 2 variables:
## $ nrows: int 1 2 3 4 5 6 7
## $ data :List of 7
## ..$ : chr "data1"
## ..$ : chr "data2"
## ..$ : chr "data3" "data4" "data5"
## ..$ : chr "data6"
## ..$ : chr "data7" "data8"
## ..$ : chr "data9" "data10" "data11" "data12"
## ..$ : chr "data13"
## ..- attr(*, "class")= chr "AsIs"
2) If, instead, you want a character column in which each element of the data
column is a single comma-space separated string or comma-separated string then after running (1) above run one of these:
DF$data <- sapply(DF$data, toString)
str(DF)
## 'data.frame': 7 obs. of 2 variables:
## $ nrows: int 1 2 3 4 5 6 7
## $ data : chr "data1" "data2" "data3, data4, data5" "data6" ...
DF$data <- sapply(DF$data, paste, collapse = ",")
str(DF)
## 'data.frame': 7 obs. of 2 variables:
## $ nrows: int 1 2 3 4 5 6 7
## $ data : chr "data1" "data2" "data3,data4,data5" "data6" ...
CodePudding user response:
There's a very nice function in the collapse package for this, unlist2d
Output from unlist2d(list1) is
.id.1 .id.2 .id.3 .id.4 V1 V2 V3 V4
1 1 1 NA NA data1 <NA> <NA> <NA>
2 2 1 1 NA data2 <NA> <NA> <NA>
3 3 1 NA NA data3 data4 data5 <NA>
4 4 1 1 NA data6 <NA> <NA> <NA>
5 5 1 NA NA data7 data8 <NA> <NA>
6 6 1 1 1 data9 data10 data11 data12
7 7 1 NA NA data13 <NA> <NA> <NA>
There are options to work with the column and row names