Home > database >  How to filter nested tibble if one row contains list()/no nested tibble
How to filter nested tibble if one row contains list()/no nested tibble

Time:03-19

I am struggling to filter nested tibbles, when one row doesn't contain a nested tibble.

my_df contains a nested tibble in the column products. I want to filter the nested tibble so that it only contains the value apple in its column food.

I can do this withmutate(products=map(products, ~filter(.x, str_detect(food, "apple"))). However, I fail to do so, when there is a row in my_df which contains no/an empty nested tibble (list()).

I tried to circumvent the issue by creating an auxiliary column which checks for the number of rows in the nested tibble and then apply the search only to those rows where nrow > 0. However, my approach with case_when fails, and I don't know why.

I would be grateful for any hint. Note that I am aware that I could split my_df in two separate dfs (one with list(), and one with nested tibbles), and later row_bind them. The approach with case_when seems to be more convenient in my use case and I would like to understand why it doesn't work. Below the reprex. Many thanks!

library(tidyverse)


my_df <- structure(list(branch_name = c("basket1", "basket2"), products = list(
  structure(list(), class = c(
    "tbl_df", "tbl",
    "data.frame"
  ), row.names = integer(0), .Names = character(0)),
  structure(list(
    food = c(
      "apple",
      "grape"
    ),
    supplier = c("john", "jack")),
  class = c("tbl_df", "tbl", "data.frame"),
  row.names = c(NA, -2L)
  )
)), row.names = c(NA, -2L), class = c(
  "tbl_df",
  "tbl", "data.frame"
))
my_df
#> # A tibble: 2 x 2
#>   branch_name products        
#>   <chr>       <list>          
#> 1 basket1     <tibble [0 x 0]>
#> 2 basket2     <tibble [2 x 2]>


#Try to filter the nested df 'products', keep only rows where str_detect(food, "apple")==T
#fails
x <- my_df %>% 
  mutate(products=map(products, ~filter(.x, str_detect(food, "apple"))))
#> Error in `mutate_cols()`:
#> ! Problem with `mutate()` column `products`.
#> i `products = map(products, ~filter(.x, str_detect(food, "apple")))`.
#> x Problem with `filter()` input `..1`.
#> i Input `..1` is `str_detect(food, "apple")`.
#> x object 'food' not found
#> Caused by error in `h()`:
#> ! Problem with `filter()` input `..1`.
#> i Input `..1` is `str_detect(food, "apple")`.
#> x object 'food' not found

  
#filter works  if in no row the nested df is list()
y <- my_df %>% 
  mutate(products_nrow=map_dbl(products, nrow)) %>% 
  filter(products_nrow>0) %>% 
  mutate(products=map(products, ~filter(.x, str_detect(food, "apple"))))

#correct result
y  
#> # A tibble: 1 x 3
#>   branch_name products         products_nrow
#>   <chr>       <list>                   <dbl>
#> 1 basket2     <tibble [1 x 2]>             2
y$products
#> [[1]]
#> # A tibble: 1 x 2
#>   food  supplier
#>   <chr> <chr>   
#> 1 apple john


#account for nrows of nested df and use case_when; fails
my_df %>% 
  mutate(products_nrow=map_dbl(products, nrow)) %>% 
  mutate(products=case_when(
    products_nrow>0 ~ map(products, ~filter(.x, str_detect(food, "apple"))),
    TRUE ~ products))
#> Error in `mutate_cols()`:
#> ! Problem with `mutate()` column `products`.
#> i `products = case_when(...)`.
#> x Problem with `filter()` input `..1`.
#> i Input `..1` is `str_detect(food, "apple")`.
#> x object 'food' not found
#> Caused by error in `h()`:
#> ! Problem with `filter()` input `..1`.
#> i Input `..1` is `str_detect(food, "apple")`.
#> x object 'food' not found

Created on 2022-03-18 by the reprex package (v2.0.1)

CodePudding user response:

A hacky solution that doesn't answer your question directly, but possibly the easiest thing to do is simply unnest (to remove the empty tibbles) and nest again before applying your filter:

 my_df %>% 
   unnest(products) %>%
   nest(products = -branch_name) %>%
   mutate(products=map(products, ~filter(.x, str_detect(food, "apple"))))

Resulting in:

# A tibble: 1 × 2
  branch_name products        
  <chr>       <list>          
1 basket2     <tibble [1 × 2]>

CodePudding user response:

You could use an if condition to e.g. check whether there is a column food in the dataset:

library(dplyr)
library(purrr)
library(strings)

my_df %>% 
  mutate(products = map(products, ~ if ("food" %in% names(.x)) filter(.x, str_detect(food, "apple")) else .x))
#> # A tibble: 2 × 2
#>   branch_name products        
#>   <chr>       <list>          
#> 1 basket1     <tibble [0 × 0]>
#> 2 basket2     <tibble [1 × 2]>

CodePudding user response:

Another possible solution:

library(tidyverse)

my_df[["products"]] <-
 map(my_df[["products"]], ~ if (nrow(.x) != 0) 
     {filter(.x, food == "apple")} else {.x})

my_df

#> # A tibble: 2 × 2
#>   branch_name products        
#>   <chr>       <list>          
#> 1 basket1     <tibble [0 × 0]>
#> 2 basket2     <tibble [1 × 2]>
  • Related