Home > database >  How to unnest a tibble list column with both list and character entries ("cannot combine list a
How to unnest a tibble list column with both list and character entries ("cannot combine list a

Time:12-15

I have a nested tibble, that I would like to unnest. Two list columns (street_address and status) contain both character vectors and lists and one list column (country) contains only character vectors. When unnesting the tibble, an error is produced, apparently due to the fact that there are both character vectors and lists in the two columns with both types of entries.

df <- tibble::tribble(
                          ~id,         ~country,                                                                                                                       ~street_address,                                                                   ~status,
          "2008-002231-32-GB", c("United Kingdom", "Netherlands"),                                                                                                                       list(c(NA, NA)),                                                           list(c(NA, NA)),
         "2020-001060-28-SE",  c("Denmark", "Denmark", "Denmark", "Denmark"), c("Palle Juul Jensens Blvrd 67", "Palle Juul Jensens Boulevard 99", "Palle Juul Jensens Blvrd 67", "Palle Juul Jensens Boulevard 99"), c("Non-Commercial", "Non-Commercial", "Non-Commercial", "Non-Commercial")
          )

df
# A tibble: 2 × 4
  id                country   street_address status    
  <chr>             <list>    <list>         <list>    
1 2008-002231-32-GB <chr [2]> <list [1]>     <list [1]>
2 2020-001060-28-SE <chr [4]> <chr [4]>      <chr [4]> 

df %>%
unnest(cols = c(country, street_address, status))
# >Error: Can't combine `..1$street_address` <list> and `..2$street_address` <character>.

Created on 2021-12-14 by the reprex package (v2.0.1)

It seems that it is the presence of the list entries in the columns that is the problem (all in the format of list(c(NA, NA))). One option could perhaps be to change those observations to a character vector (or make them NA, as they seem to all be NAs), but I cannot figure out how to do that or whether that would solve the problem. Any help would be greatly appreciated.

NB Please note, that this is an updated question, as the first data I submitted with the question, which I produced using dpasta() was not a good representation of my actual data.

The desired result should look something like this:

# A tibble: 8 × 4
  id                country        street_address                  status        
  <chr>             <chr>          <chr>                           <chr>         
1 2020-001060-28-SE Denmark        Palle Juul Jensens Blvrd 67     Non-Commercial
2 2020-001060-28-SE Denmark        Palle Juul Jensens Boulevard 99 Non-Commercial
3 2020-001060-28-SE Denmark        Palle Juul Jensens Blvrd 67     Non-Commercial
4 2020-001060-28-SE Denmark        Palle Juul Jensens Boulevard 99 Non-Commercial
5 2008-002231-32-GB United Kingdom NA                              NA                       
6 2008-002231-32-GB Netherlands    NA                              NA                        
> 
``

CodePudding user response:

You can unnest the data using unnest function in tidyr. The code looks like following:

library(tidyr)

df %>%
  mutate(r = map(street_address, ~data.frame(t(.))), s = map(status, ~data.frame(t(.)))) %>%
  unnest(r, s) %>%
  select(-street_address, -status)

And the output looks like this:

# A tibble: 6 x 12
  id     country  t...   X1     X2     X3     X4    t...1 X11   X21   X31   X41  
  <chr>  <chr>    <list> <chr>  <chr>  <chr>  <chr> <lis> <chr> <chr> <chr> <chr>
1 2008-~ United ~ <lgl ~ NA     NA     NA     NA    <lgl~ NA    NA    NA    NA   
2 2008-~ Netherl~ <lgl ~ NA     NA     NA     NA    <lgl~ NA    NA    NA    NA   
3 2020-~ Denmark  <NULL> Palle~ Palle~ Palle~ Pall~ <NUL~ Non-~ Non-~ Non-~ Non-~
4 2020-~ Denmark  <NULL> Palle~ Palle~ Palle~ Pall~ <NUL~ Non-~ Non-~ Non-~ Non-~
5 2020-~ Denmark  <NULL> Palle~ Palle~ Palle~ Pall~ <NUL~ Non-~ Non-~ Non-~ Non-~
6 2020-~ Denmark  <NULL> Palle~ Palle~ Palle~ Pall~ <NUL~ Non-~ Non-~ Non-~ Non-~

CodePudding user response:

library(tidyverse)
df <- tibble::tribble(
  ~ id,
  ~ country,
  ~ street_address,
  ~ status,
  "2008-002231-32-GB",
  c("United Kingdom", "Netherlands"),
  list(c(NA, NA)),
  list(c(NA, NA)),
  "2020-001060-28-SE",
  c("Denmark", "Denmark", "Denmark", "Denmark"),
  c(
    "Palle Juul Jensens Blvrd 67",
    "Palle Juul Jensens Boulevard 99",
    "Palle Juul Jensens Blvrd 67",
    "Palle Juul Jensens Boulevard 99"
  ),
  c(
    "Non-Commercial",
    "Non-Commercial",
    "Non-Commercial",
    "Non-Commercial"
  )
)


df %>% mutate(res = map_chr(street_address, class)) %>% 
  group_split(res) %>% 
  map(~unnest(data = ., c(country, street_address, status))) %>% 
  map_df(~unnest(data = ., c(country, street_address, status))) %>% 
  select(-res)
#> # A tibble: 8 x 4
#>   id                country        street_address                  status       
#>   <chr>             <chr>          <chr>                           <chr>        
#> 1 2020-001060-28-SE Denmark        Palle Juul Jensens Blvrd 67     Non-Commerci~
#> 2 2020-001060-28-SE Denmark        Palle Juul Jensens Boulevard 99 Non-Commerci~
#> 3 2020-001060-28-SE Denmark        Palle Juul Jensens Blvrd 67     Non-Commerci~
#> 4 2020-001060-28-SE Denmark        Palle Juul Jensens Boulevard 99 Non-Commerci~
#> 5 2008-002231-32-GB United Kingdom <NA>                            <NA>         
#> 6 2008-002231-32-GB United Kingdom <NA>                            <NA>         
#> 7 2008-002231-32-GB Netherlands    <NA>                            <NA>         
#> 8 2008-002231-32-GB Netherlands    <NA>                            <NA>

Created on 2021-12-14 by the reprex package (v2.0.1)

  • Related