For instance, given the following nested dataframe:
library(tidyverse)
mpg_nested = mpg %>%
group_by(manufacturer) %>%
nest()
How can I use the data within the nested structure to subset the overall nested object? In this example, say I wanted to only keep manufacturers within the nested dataframe that had the category "suv", that is to say the following manufacturers:
library(tidyverse)
mpg %>%
group_by(manufacturer) %>%
filter(class == "suv") %>%
distinct(manufacturer)
#> # A tibble: 10 x 1
#> # Groups: manufacturer [10]
#> manufacturer
#> <chr>
#> 1 chevrolet
#> 2 dodge
#> 3 ford
#> 4 jeep
#> 5 land rover
#> 6 lincoln
#> 7 mercury
#> 8 nissan
#> 9 subaru
#> 10 toyota
I know how to filter the actual data within the nested structure...
mpg_nested %>%
mutate(data = map(data, function(.x){
.x %>% filter(class == "suv")
}))
But how do I only keep the nested rows which have suvs?
Cheers
CodePudding user response:
Using purrr::map_lgl()
to produce a logical vector input to filter()
:
mpg_nested %>%
filter(
map_lgl(data, ~any(.x$class == "suv"))
)
#> # A tibble: 10 x 2
#> # Groups: manufacturer [10]
#> manufacturer data
#> <chr> <list>
#> 1 chevrolet <tibble [19 × 10]>
#> 2 dodge <tibble [37 × 10]>
#> 3 ford <tibble [25 × 10]>
#> 4 jeep <tibble [8 × 10]>
#> 5 land rover <tibble [4 × 10]>
#> 6 lincoln <tibble [3 × 10]>
#> 7 mercury <tibble [4 × 10]>
#> 8 nissan <tibble [13 × 10]>
#> 9 subaru <tibble [14 × 10]>
#> 10 toyota <tibble [34 × 10]>
CodePudding user response:
If you've done the fitering of the nested data frames and you want to discard the rows where the nested data frames have 0 rows:
mpg_nested %>%
mutate(data = map(data, function(.x){
.x %>% filter(class == "suv")
})) %>%
filter(map_int(data, nrow) > 0)
# # A tibble: 10 × 2
# # Groups: manufacturer [10]
# manufacturer data
# <chr> <list>
# 1 chevrolet <tibble [9 × 10]>
# 2 dodge <tibble [7 × 10]>
# 3 ford <tibble [9 × 10]>
# 4 jeep <tibble [8 × 10]>
# 5 land rover <tibble [4 × 10]>
# 6 lincoln <tibble [3 × 10]>
# 7 mercury <tibble [4 × 10]>
# 8 nissan <tibble [4 × 10]>
# 9 subaru <tibble [6 × 10]>
# 10 toyota <tibble [8 × 10]>
If you want to keep the nested data frames intact but filter out rows of the top level data frame that where the nested ones don't have SUVs:
mpg_nested %>%
filter(map_lgl(data, \(x) "suv" %in% x[["class"]]))
# # A tibble: 10 × 2
# # Groups: manufacturer [10]
# manufacturer data
# <chr> <list>
# 1 chevrolet <tibble [19 × 10]>
# 2 dodge <tibble [37 × 10]>
# 3 ford <tibble [25 × 10]>
# 4 jeep <tibble [8 × 10]>
# 5 land rover <tibble [4 × 10]>
# 6 lincoln <tibble [3 × 10]>
# 7 mercury <tibble [4 × 10]>
# 8 nissan <tibble [13 × 10]>
# 9 subaru <tibble [14 × 10]>
# 10 toyota <tibble [34 × 10]>
CodePudding user response:
An alternative idea via group_split()
library(tidyverse)
mpg_nested <- mpg %>%
group_split(manufacturer) %>%
map(~ .x %>% filter(class == "suv")) %>%
purrr::discard(~ nrow(.x) == 0)