Home > Mobile >  Unnest data frame column with list when some lists are empty
Unnest data frame column with list when some lists are empty

Time:11-14

I have the following data (coming from Twitter):

structure(list(entities.urls = list(structure(list(), .Names = character(0)), 
    NULL, structure(list(), .Names = character(0)), structure(list(), .Names = character(0)), 
    structure(list(start = 245L, end = 268L, url = "https://something.com", 
        expanded_url = "https://www.rtlnieuws.nl/nieuws/nederland/artikel/5330834/spaartaks-spaarders-compensatie-hoge-raad", 
        display_url = "rtlnieuws.nl/nieuws/nederla…", images = list(
            structure(list(url = c("https://pbs.twimg.com/news_img/1569417166549663745/p12uVzUj?format=jpg&name=orig", 
            "https://pbs.twimg.com/news_img/1569417166549663745/p12uVzUj?format=jpg&name=150x150"
            ), width = c(1024L, 150L), height = c(576L, 150L)), class = "data.frame", row.names = 1:2)), 
        status = 200L, title = "Geen compensatie voor spaarders die te laat bezwaar maakten", 
        description = "Het kabinet gaat spaarders die te laat of geen bezwaar hebben gemaakt tegen de spaartaks niet compenseren. Dat bevestigen Haagse bronnen aan RTL Nieuws. Voor de zomer oordeelde de Hoge Raad al dat deze mensen geen recht hebben op compensatie.", 
        unwound_url = "https://www.rtlnieuws.nl/nieuws/nederland/artikel/5330834/spaartaks-spaarders-compensatie-hoge-raad"), class = "data.frame", row.names = 1L), 
    structure(list(start = 197L, end = 220L, url = "https://something.com", 
        expanded_url = "https://fd.nl/financiele-markten/1432905/oorlog-in-oekraine-is-ultieme-stresstest-voor-grondstoffenhandelaren?utm_medium=social&utm_source=app&utm_campaign=earned&utm_content=20220312&utm_term=app-ios", 
        display_url = "fd.nl/financiele-mar…", status = 200L, 
        unwound_url = "https://fd.nl/financiele-markten/1432905/oorlog-in-oekraine-is-ultieme-stresstest-voor-grondstoffenhandelaren?utm_medium=social&utm_source=app&utm_campaign=earned&utm_content=20220312&utm_term=app-ios"), class = "data.frame", row.names = 1L), 
    structure(list(), .Names = character(0)), structure(list(), .Names = character(0)), 
    structure(list(), .Names = character(0)), structure(list(), .Names = character(0)))), class = "data.frame", row.names = c(NA, 
10L))

Each row of column entities.urls is either NULL, or contains a list, which is sometimes empty and sometimes holds a dataframe. I wanted to unnest that column so that so that every column of the nested data frame becomes a column in the top-level dataframe. Also, data should be in long format so that every row is repeated for the number of rows of the nested data frame.

I have tried with dplyr's unnest:

tweets_02 %>% unnest(entities.urls, keep_empty = TRUE)

which throws an error. I guess the problem are the empty lists, but I have found no way filter them out efficiently.

CodePudding user response:

Are you looking for something like this?

library(tidyverse)

test |>
  filter( map_chr(entities.urls, class)=="data.frame") |>
  unnest(entities.urls)
#> # A tibble: 2 x 10
#>   start   end url            expan~1 displ~2 images status title descr~3 unwou~4
#>   <int> <int> <chr>          <chr>   <chr>   <list>  <int> <chr> <chr>   <chr>  
#> 1   245   268 https://somet~ https:~ rtlnie~ <df>      200 Geen~ Het ka~ https:~
#> 2   197   220 https://somet~ https:~ fd.nl/~ <NULL>    200 <NA>  <NA>    https:~
#> # ... with abbreviated variable names 1: expanded_url, 2: display_url,
#> #   3: description, 4: unwound_url

CodePudding user response:

Here is another approach using rrapply() in package rrapply with option how = "bind" to bind repeated observations in a nested list into a wide data.frame:

library(rrapply)

rrapply(tweets_02, how = "bind", options = list(coldepth = 3))

#>   start end                   url                                                                                                                                                                                             expanded_url                  display_url                                                                                                                                                          images.1.url images.1.width images.1.height status                                                       title                                                                                                                                                                                                                                        description                                                                                                                                                                                              unwound_url
#> 1   245 268 https://something.com                                                                                                      https://www.rtlnieuws.nl/nieuws/nederland/artikel/5330834/spaartaks-spaarders-compensatie-hoge-raad rtlnieuws.nl/nieuws/nederla… https://pbs.twimg.com/news_img/1569417166549663745/p12uVzUj?format=jpg&name=orig, https://pbs.twimg.com/news_img/1569417166549663745/p12uVzUj?format=jpg&name=150x150      1024, 150        576, 150    200 Geen compensatie voor spaarders die te laat bezwaar maakten Het kabinet gaat spaarders die te laat of geen bezwaar hebben gemaakt tegen de spaartaks niet compenseren. Dat bevestigen Haagse bronnen aan RTL Nieuws. Voor de zomer oordeelde de Hoge Raad al dat deze mensen geen recht hebben op compensatie.                                                                                                      https://www.rtlnieuws.nl/nieuws/nederland/artikel/5330834/spaartaks-spaarders-compensatie-hoge-raad
#> 2   197 220 https://something.com https://fd.nl/financiele-markten/1432905/oorlog-in-oekraine-is-ultieme-stresstest-voor-grondstoffenhandelaren?utm_medium=social&utm_source=app&utm_campaign=earned&utm_content=20220312&utm_term=app-ios        fd.nl/financiele-mar…                                                                                                                                                                    NA             NA              NA    200                                                        <NA>                                                                                                                                                                                                                                               <NA> https://fd.nl/financiele-markten/1432905/oorlog-in-oekraine-is-ultieme-stresstest-voor-grondstoffenhandelaren?utm_medium=social&utm_source=app&utm_campaign=earned&utm_content=20220312&utm_term=app-ios
  • Related