I am struggling to filter nested tibbles, when one row doesn't contain a nested tibble.
my_df
contains a nested tibble in the column products
. I want to filter the nested tibble so that it only contains the value apple
in its column food
.
I can do this withmutate(products=map(products, ~filter(.x, str_detect(food, "apple")))
.
However, I fail to do so, when there is a row in my_df
which contains no/an empty nested tibble (list()).
I tried to circumvent the issue by creating an auxiliary column which checks for the number of rows in the nested tibble and then apply the search only to those rows where nrow > 0. However, my approach with case_when
fails, and I don't know why.
I would be grateful for any hint. Note that I am aware that I could split my_df
in two separate dfs (one with list(), and one with nested tibbles), and later row_bind
them. The approach with case_when
seems to be more convenient in my use case and I would like to understand why it doesn't work. Below the reprex. Many thanks!
library(tidyverse)
my_df <- structure(list(branch_name = c("basket1", "basket2"), products = list(
structure(list(), class = c(
"tbl_df", "tbl",
"data.frame"
), row.names = integer(0), .Names = character(0)),
structure(list(
food = c(
"apple",
"grape"
),
supplier = c("john", "jack")),
class = c("tbl_df", "tbl", "data.frame"),
row.names = c(NA, -2L)
)
)), row.names = c(NA, -2L), class = c(
"tbl_df",
"tbl", "data.frame"
))
my_df
#> # A tibble: 2 x 2
#> branch_name products
#> <chr> <list>
#> 1 basket1 <tibble [0 x 0]>
#> 2 basket2 <tibble [2 x 2]>
#Try to filter the nested df 'products', keep only rows where str_detect(food, "apple")==T
#fails
x <- my_df %>%
mutate(products=map(products, ~filter(.x, str_detect(food, "apple"))))
#> Error in `mutate_cols()`:
#> ! Problem with `mutate()` column `products`.
#> i `products = map(products, ~filter(.x, str_detect(food, "apple")))`.
#> x Problem with `filter()` input `..1`.
#> i Input `..1` is `str_detect(food, "apple")`.
#> x object 'food' not found
#> Caused by error in `h()`:
#> ! Problem with `filter()` input `..1`.
#> i Input `..1` is `str_detect(food, "apple")`.
#> x object 'food' not found
#filter works if in no row the nested df is list()
y <- my_df %>%
mutate(products_nrow=map_dbl(products, nrow)) %>%
filter(products_nrow>0) %>%
mutate(products=map(products, ~filter(.x, str_detect(food, "apple"))))
#correct result
y
#> # A tibble: 1 x 3
#> branch_name products products_nrow
#> <chr> <list> <dbl>
#> 1 basket2 <tibble [1 x 2]> 2
y$products
#> [[1]]
#> # A tibble: 1 x 2
#> food supplier
#> <chr> <chr>
#> 1 apple john
#account for nrows of nested df and use case_when; fails
my_df %>%
mutate(products_nrow=map_dbl(products, nrow)) %>%
mutate(products=case_when(
products_nrow>0 ~ map(products, ~filter(.x, str_detect(food, "apple"))),
TRUE ~ products))
#> Error in `mutate_cols()`:
#> ! Problem with `mutate()` column `products`.
#> i `products = case_when(...)`.
#> x Problem with `filter()` input `..1`.
#> i Input `..1` is `str_detect(food, "apple")`.
#> x object 'food' not found
#> Caused by error in `h()`:
#> ! Problem with `filter()` input `..1`.
#> i Input `..1` is `str_detect(food, "apple")`.
#> x object 'food' not found
Created on 2022-03-18 by the reprex package (v2.0.1)
CodePudding user response:
A hacky solution that doesn't answer your question directly, but possibly the easiest thing to do is simply unnest
(to remove the empty tibbles) and nest
again before applying your filter:
my_df %>%
unnest(products) %>%
nest(products = -branch_name) %>%
mutate(products=map(products, ~filter(.x, str_detect(food, "apple"))))
Resulting in:
# A tibble: 1 × 2
branch_name products
<chr> <list>
1 basket2 <tibble [1 × 2]>
CodePudding user response:
You could use an if
condition to e.g. check whether there is a column food
in the dataset:
library(dplyr)
library(purrr)
library(strings)
my_df %>%
mutate(products = map(products, ~ if ("food" %in% names(.x)) filter(.x, str_detect(food, "apple")) else .x))
#> # A tibble: 2 × 2
#> branch_name products
#> <chr> <list>
#> 1 basket1 <tibble [0 × 0]>
#> 2 basket2 <tibble [1 × 2]>
CodePudding user response:
Another possible solution:
library(tidyverse)
my_df[["products"]] <-
map(my_df[["products"]], ~ if (nrow(.x) != 0)
{filter(.x, food == "apple")} else {.x})
my_df
#> # A tibble: 2 × 2
#> branch_name products
#> <chr> <list>
#> 1 basket1 <tibble [0 × 0]>
#> 2 basket2 <tibble [1 × 2]>