Home > Blockchain >  When using tidyr::unnest_wider(), how to name new columns based on chr vector
When using tidyr::unnest_wider(), how to name new columns based on chr vector

Time:11-05

I have the following data structure:

library(tibble)

my_tbl <-
  tibble::tribble(
                   ~col_x,   ~col_y,
                   "a",      list(1, 2, 3),
                   "b",      list(4, 5, 6),
                   "c",      list(7, 8, 9)
                   )

And I want to use tidyr::unnest_wider() to separate col_y to columns. The names for those new columns should be taken from animal_names vector:

animal_names <- c("dog", "cat", "zebra")

How can I utilize unnest_wider() to apply the names from animal_names, thereby avoiding the following naming warning:

library(tidyr)

my_tbl %>%  
  unnest_wider(col_y)
#> New names:
#> * `` -> ...1
#> * `` -> ...2
#> * `` -> ...3
#> New names:
#> * `` -> ...1
#> * `` -> ...2
#> * `` -> ...3
#> New names:
#> * `` -> ...1
#> * `` -> ...2
#> * `` -> ...3
#> # A tibble: 3 x 4
#>   col_x  ...1  ...2  ...3
#>   <chr> <dbl> <dbl> <dbl>
#> 1 a         1     2     3
#> 2 b         4     5     6
#> 3 c         7     8     9

desired output

## # A tibble: 3 x 4
##   col_x   dog   cat zebra
##   <chr> <dbl> <dbl> <dbl>
## 1 a         1     2     3
## 2 b         4     5     6
## 3 c         7     8     9

Please note that @akrun suggested to add names to the nested values before unnesting.

library(dplyr)
library(purrr)

my_tbl %>%
  mutate(across(col_y, ~map(., .f = ~set_names(.x, animal_names)))) %>%
  unnest_wider(col_y)
#> # A tibble: 3 x 4
#>   col_x   dog   cat zebra
#>   <chr> <dbl> <dbl> <dbl>
#> 1 a         1     2     3
#> 2 b         4     5     6
#> 3 c         7     8     9

However, this is a redundant and expensive operation when we deal with large datasets. Can't we just apply names through unnest_wider()'s names_repair argument?

CodePudding user response:

The names_repair needs the full column names in addition to the column unnested. So, we create a vector of names excluding the 'col_y' concatenated with 'animal_names' vector and use that in names_repair

library(dplyr)
library(tidyr)
nm1 <- c(setdiff(names(my_tbl), 'col_y'), animal_names)
my_tbl %>%  
  unnest_wider(col_y, names_repair = ~ nm1) %>%
  suppressMessages

-output

# A tibble: 3 × 4
  col_x   dog   cat zebra
  <chr> <dbl> <dbl> <dbl>
1 a         1     2     3
2 b         4     5     6
3 c         7     8     9
  • Related