Home > Back-end >  Nest a tibble by column prefix
Nest a tibble by column prefix

Time:07-04

We do a normal nesting grouping by rows. Mine is different. I want to create a nested tibble grouping by column prefixes (before the first '_'), preserving the original column names in the nested tibbles. The current approach works but looks overcomplicated.

tibble(a_1=1:3, a_2=2:4, b_1=3:5) %>% 
    print() %>%

#  A tibble: 3 x 3
#     a_1   a_2   b_1
#   <int> <int> <int>
# 1     1     2     3
# 2     2     3     4
# 3     3     4     5

    pivot_longer(everything()) %>% 
    nest(data=-name) %>% 
    mutate(data=map2(data, name, ~rename(.x, '{.y}' := value))) %>% 
    mutate(gr=str_extract(name, '^[^_] '), .keep='unused') %>% 
    nest(data=-gr) %>% 
    mutate(data=map(data, ~bind_cols(.[[1]]))) %>%
    print() %>%

# A tibble: 2 x 2
#   gr    data            
#   <chr> <list>          
# 1  a     <tibble [3 x 2]>
# 2  b     <tibble [3 x 1]>

    { .$data[[1]] }

# A tibble: 3 x 2
#     a_1   a_2
#   <int> <int>
# 1     1     2
# 2     2     3
# 3     3     4

UPD: if possible, tidyverse solution

CodePudding user response:

Using a neat little trick I learned lately you could do:

library(tidyr)
library(dplyr, warn = FALSE)

tibble(a_1 = 1:3, a_2 = 2:4, b_1 = 3:5) %>%
  split.default(., gsub("_[0-9]", "", names(.))) %>%
  lapply(nest, data = everything()) %>%
  bind_rows(.id = "gr")
#> # A tibble: 2 × 2
#>   gr    data            
#>   <chr> <list>          
#> 1 a     <tibble [3 × 2]>
#> 2 b     <tibble [3 × 1]>

CodePudding user response:

Another possible solution, based on purrr::map_dfr:

library(tidyverse)

map_dfr(unique(str_remove(names(df), "_\\d ")), 
  ~ tibble(gr = .x, nest(select(df, which(str_detect(names(df), .x))), 
      data = everything())))

#> # A tibble: 2 × 2
#>   gr    data            
#>   <chr> <list>          
#> 1 a     <tibble [3 × 2]>
#> 2 b     <tibble [3 × 1]>
  • Related