Home > OS >  How do I unnest columns of tibble
How do I unnest columns of tibble

Time:08-10

Below is a tibble. I wish to unnest it. The desired tibble should contain 16 rows.

  1. The first row should expand to 4 rows
  2. The second row should expand to 2 rows
  3. The third row should expand to 5*2 rows

The problem here is that in the third row, we have a vector of size 5 and a vector of size 2. I want that row to expand to 5 * 2 = 10 rows.

library(tidyverse)

tbl <- tibble(
  a_X_values = list(NA, c(3, NA), c(4, 5, 6, NaN, NaN)),
  a_Y_values = list(c(NaN, NaN, 3, 5), NaN, NaN),
  b_X_values = list(7, 8, c(3, NaN)),
  b_Y_values = c(NA, NaN, 1)
)

tbl1 <- tbl %>% 
  unnest(where(is.list)) 

The desired output would be

tbl_desired <- tibble(
  a_X_values = c(NA, NA, NA, NA, 3, NA, 4, 5, 6, NaN, NaN, 4, 5, 6, NaN, NaN ),
  a_Y_values = c(NaN, NaN, 3, 5, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN),
  b_X_values = c(7, 7, 7, 7, 8, 8, 3, NaN, 3, NaN, 3, NaN, 3, NaN, 3, NaN),
  b_Y_values = c(NA, NA, NA, NA, NaN, NaN, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
)

Edit - adding another question -

I have a small request. If you can help me with that. 3 rows were extended to 16 rows, but all of them do not make sense. I am only interested in rows which contain a combination of real numbers. Therefore, the only useful rows are # 3, 4, 5, 7, 9, 11. Is there a way to reduce these rows only?

CodePudding user response:

For sequential unnesting, I'd use a for loop:

list_cols = sapply(tbl, is.list)
for(col in names(list_cols)[list_cols]){
  tbl <- tidyr::unnest(tbl, all_of(col))
}
tbl
# # A tibble: 16 × 4
#    a_X_values a_Y_values b_X_values b_Y_values
#         <dbl>      <dbl>      <dbl>      <dbl>
#  1         NA        NaN          7         NA
#  2         NA        NaN          7         NA
#  3         NA          3          7         NA
#  4         NA          5          7         NA
#  5          3        NaN          8        NaN
#  6         NA        NaN          8        NaN
#  7          4        NaN          3          1
#  8          4        NaN        NaN          1
#  9          5        NaN          3          1
# 10          5        NaN        NaN          1
# 11          6        NaN          3          1
# 12          6        NaN        NaN          1
# 13        NaN        NaN          3          1
# 14        NaN        NaN        NaN          1
# 15        NaN        NaN          3          1
# 16        NaN        NaN        NaN          1

CodePudding user response:

In tidyverse, we can use reduce

library(purrr)
library(tidyr)
reduce(1:3, unnest, .init = tbl)

-output

# A tibble: 16 × 4
   a_X_values a_Y_values b_X_values b_Y_values
        <dbl>      <dbl>      <dbl>      <dbl>
 1         NA        NaN          7         NA
 2         NA        NaN          7         NA
 3         NA          3          7         NA
 4         NA          5          7         NA
 5          3        NaN          8        NaN
 6         NA        NaN          8        NaN
 7          4        NaN          3          1
 8          4        NaN        NaN          1
 9          5        NaN          3          1
10          5        NaN        NaN          1
11          6        NaN          3          1
12          6        NaN        NaN          1
13        NaN        NaN          3          1
14        NaN        NaN        NaN          1
15        NaN        NaN          3          1
16        NaN        NaN        NaN          1

If it needs to pick the list columns automatically, find the column index or column names where the columns are list (from select) and use that in the reduce

library(dplyr)
tbl %>%
   select(where(is.list)) %>% 
   names %>%
   reduce(unnest, .init = tbl)
# A tibble: 16 × 4
   a_X_values a_Y_values b_X_values b_Y_values
        <dbl>      <dbl>      <dbl>      <dbl>
 1         NA        NaN          7         NA
 2         NA        NaN          7         NA
 3         NA          3          7         NA
 4         NA          5          7         NA
 5          3        NaN          8        NaN
 6         NA        NaN          8        NaN
 7          4        NaN          3          1
 8          4        NaN        NaN          1
 9          5        NaN          3          1
10          5        NaN        NaN          1
11          6        NaN          3          1
12          6        NaN        NaN          1
13        NaN        NaN          3          1
14        NaN        NaN        NaN          1
15        NaN        NaN          3          1
16        NaN        NaN        NaN          1

If we need to subset the rows based on the number of non-NA elements

reduce(1:3, unnest, .init = tbl) %>%
   filter(rowSums(!is.na(.[1:3])) > 1)
# A tibble: 6 × 4
  a_X_values a_Y_values b_X_values b_Y_values
       <dbl>      <dbl>      <dbl>      <dbl>
1         NA          3          7         NA
2         NA          5          7         NA
3          3        NaN          8        NaN
4          4        NaN          3          1
5          5        NaN          3          1
6          6        NaN          3          1

Or if we need to do this programmatically

library(pipeR)
library(magrittr)
tbl %>%
   select(where(is.list)) %>% 
   names %>>%
   (~ tmp) %>% 
   reduce(unnest, .init = tbl) %>%
   filter(across(all_of(tmp), complete.cases) %>% 
               reduce(` `) %>%
              is_greater_than(1))

-output

# A tibble: 6 × 4
  a_X_values a_Y_values b_X_values b_Y_values
       <dbl>      <dbl>      <dbl>      <dbl>
1         NA          3          7         NA
2         NA          5          7         NA
3          3        NaN          8        NaN
4          4        NaN          3          1
5          5        NaN          3          1
6          6        NaN          3          1
  • Related