Below is a tibble. I wish to unnest it. The desired tibble should contain 16 rows.
- The first row should expand to 4 rows
- The second row should expand to 2 rows
- The third row should expand to 5*2 rows
The problem here is that in the third row, we have a vector of size 5 and a vector of size 2. I want that row to expand to 5 * 2 = 10 rows.
library(tidyverse)
tbl <- tibble(
a_X_values = list(NA, c(3, NA), c(4, 5, 6, NaN, NaN)),
a_Y_values = list(c(NaN, NaN, 3, 5), NaN, NaN),
b_X_values = list(7, 8, c(3, NaN)),
b_Y_values = c(NA, NaN, 1)
)
tbl1 <- tbl %>%
unnest(where(is.list))
The desired output would be
tbl_desired <- tibble(
a_X_values = c(NA, NA, NA, NA, 3, NA, 4, 5, 6, NaN, NaN, 4, 5, 6, NaN, NaN ),
a_Y_values = c(NaN, NaN, 3, 5, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN),
b_X_values = c(7, 7, 7, 7, 8, 8, 3, NaN, 3, NaN, 3, NaN, 3, NaN, 3, NaN),
b_Y_values = c(NA, NA, NA, NA, NaN, NaN, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
)
Edit - adding another question -
I have a small request. If you can help me with that. 3 rows were extended to 16 rows, but all of them do not make sense. I am only interested in rows which contain a combination of real numbers. Therefore, the only useful rows are # 3, 4, 5, 7, 9, 11. Is there a way to reduce these rows only?
CodePudding user response:
For sequential unnesting, I'd use a for
loop:
list_cols = sapply(tbl, is.list)
for(col in names(list_cols)[list_cols]){
tbl <- tidyr::unnest(tbl, all_of(col))
}
tbl
# # A tibble: 16 × 4
# a_X_values a_Y_values b_X_values b_Y_values
# <dbl> <dbl> <dbl> <dbl>
# 1 NA NaN 7 NA
# 2 NA NaN 7 NA
# 3 NA 3 7 NA
# 4 NA 5 7 NA
# 5 3 NaN 8 NaN
# 6 NA NaN 8 NaN
# 7 4 NaN 3 1
# 8 4 NaN NaN 1
# 9 5 NaN 3 1
# 10 5 NaN NaN 1
# 11 6 NaN 3 1
# 12 6 NaN NaN 1
# 13 NaN NaN 3 1
# 14 NaN NaN NaN 1
# 15 NaN NaN 3 1
# 16 NaN NaN NaN 1
CodePudding user response:
In tidyverse
, we can use reduce
library(purrr)
library(tidyr)
reduce(1:3, unnest, .init = tbl)
-output
# A tibble: 16 × 4
a_X_values a_Y_values b_X_values b_Y_values
<dbl> <dbl> <dbl> <dbl>
1 NA NaN 7 NA
2 NA NaN 7 NA
3 NA 3 7 NA
4 NA 5 7 NA
5 3 NaN 8 NaN
6 NA NaN 8 NaN
7 4 NaN 3 1
8 4 NaN NaN 1
9 5 NaN 3 1
10 5 NaN NaN 1
11 6 NaN 3 1
12 6 NaN NaN 1
13 NaN NaN 3 1
14 NaN NaN NaN 1
15 NaN NaN 3 1
16 NaN NaN NaN 1
If it needs to pick the list columns automatically, find the column index or column names where the columns are list (from select
) and use that in the reduce
library(dplyr)
tbl %>%
select(where(is.list)) %>%
names %>%
reduce(unnest, .init = tbl)
# A tibble: 16 × 4
a_X_values a_Y_values b_X_values b_Y_values
<dbl> <dbl> <dbl> <dbl>
1 NA NaN 7 NA
2 NA NaN 7 NA
3 NA 3 7 NA
4 NA 5 7 NA
5 3 NaN 8 NaN
6 NA NaN 8 NaN
7 4 NaN 3 1
8 4 NaN NaN 1
9 5 NaN 3 1
10 5 NaN NaN 1
11 6 NaN 3 1
12 6 NaN NaN 1
13 NaN NaN 3 1
14 NaN NaN NaN 1
15 NaN NaN 3 1
16 NaN NaN NaN 1
If we need to subset the rows based on the number of non-NA elements
reduce(1:3, unnest, .init = tbl) %>%
filter(rowSums(!is.na(.[1:3])) > 1)
# A tibble: 6 × 4
a_X_values a_Y_values b_X_values b_Y_values
<dbl> <dbl> <dbl> <dbl>
1 NA 3 7 NA
2 NA 5 7 NA
3 3 NaN 8 NaN
4 4 NaN 3 1
5 5 NaN 3 1
6 6 NaN 3 1
Or if we need to do this programmatically
library(pipeR)
library(magrittr)
tbl %>%
select(where(is.list)) %>%
names %>>%
(~ tmp) %>%
reduce(unnest, .init = tbl) %>%
filter(across(all_of(tmp), complete.cases) %>%
reduce(` `) %>%
is_greater_than(1))
-output
# A tibble: 6 × 4
a_X_values a_Y_values b_X_values b_Y_values
<dbl> <dbl> <dbl> <dbl>
1 NA 3 7 NA
2 NA 5 7 NA
3 3 NaN 8 NaN
4 4 NaN 3 1
5 5 NaN 3 1
6 6 NaN 3 1