Home > other >  mixing unnest_longer and unnest_wider
mixing unnest_longer and unnest_wider

Time:09-10

I'm (once more) stuck with flattening nested lists.

I have this tibble with some list-columns (originating from a JSON format).

    library(tidyr)
    library(dplyr)
    df = tibble(id = c(1, 2, 3),
            branch = list(NULL, list(colA = 'abc', colB = 'mno'),
                          list(list(colA = 'def', colB = 'uvw'),
                               list(colA = 'ghi', colB = 'xyz'))))

I want to unnest_wider column 'branch'. That works with rows 1 and 2:

df %>% 
  slice(1:2) %>% 
  unnest_wider(branch)

However, row 3 consists of a list of lists which I have to unnest_longer first:

bind_rows(
  df %>% slice(1,2),
  df %>% slice(3) %>% unnest_longer(branch)) %>% 
  unnest_wider(branch)

above code gives the desired output, but I'm looking for a generic solution like:

If an element of column 'branch' is of type 'unnamed list' (indicating that there is a list of lists) then unnest_longer. Afterwards apply unnest_wider to the whole column 'branch'

Any help appreciated!

CodePudding user response:

A little bit convoluted but here's a possible solution:

  1. Iterate through the rows of your df
  2. Determine if it's a named list by checking names(df$branch[[index]])
  3. If unnamed --> slice unnest; if named --> slice
  4. Finally, unnest_wider()
library(tidyr)
library(dplyr)
library(purrr)

map_df(1:nrow(df), function(x) {
  if (is.null(names(df$branch[[x]]))) {
    df %>% slice(x) %>% unnest_longer(branch)
  } else {
    df %>% slice(x)
  }
}) %>% 
  unnest_wider(branch)

Which returns:

# A tibble: 4 × 3
     id colA  colB 
  <dbl> <chr> <chr>
1     1 NA    NA   
2     2 abc   mno  
3     3 def   uvw  
4     3 ghi   xyz

CodePudding user response:

library(tidyverse)
df <- tibble(
  id = c(1, 2, 3),
  branch = list(
    NULL, list(colA = "abc", colB = "mno"),
    list(
      list(colA = "def", colB = "uvw"),
      list(colA = "ghi", colB = "xyz")
    )
  )
)

unnester <- function(x, grp) {
  if (grp) {
    x <- x |> unnest_longer(branch)
  }
  unnest_wider(x, branch)
}

df |>
  rowwise() |>
  mutate(grp = length(names(unlist(branch))) > 2) |>
  ungroup() |>
  split(~grp) |>
  imap_dfr(~ unnester(.x, .y)) |>
  select(-grp)
  • Related