Home > front end >  tidy way of filtering, but putting complement into tibble
tidy way of filtering, but putting complement into tibble

Time:05-06

Let's say I'm filtering a tibble, doing some processing, filtering and then doing some more processing, but I want to keep the complement of data that I filtered out at each step.

E.g. instead of

library(tidyverse)
data(mtcars)
mtcars %>%
  filter(cyl<5) %>%
  filter(gear>3 & wt>3) %>%
  filter(mpg>23)

I might want something like this without doing separate steps where I store a tibble, do another step to store the complement and then do another step etc.

library(tidyverse)
data(mtcars)
mtcars1 = mtcars %>%
  filter(cyl<5, keep_complement="mtcars2") %>%
  filter(gear>3 & wt>3, keep_complement="mtcars3") %>%
  filter(mpg>23, keep_complement="mtcars4")
# Desired outcome: 4 tibbles mtcars1 to 4

mtcars_final = bind_rows(mtcars1, mtcars2, mtcars3, mtcars4)

In case you wonder why I want this: I have some increasingly complex string operations to solve a problem, first simple direct comparison, then some regex/fuzzy string matching, then something for which I may end up having to use a neural network. It feels like there should be some neat way of doing the expensive operations only on a subset without having to write the code in so many steps.

I.e. what I'm trying to avoid is something really clumsy looking like this (which also requires me to invert any filtering operations myself - in practice also having to consider NA values etc.):

mtcars_tmp <- mtcars %>%
    filter(cyl<5)

mtcars2 <- mtcars %>%
    filter(cyl>=5)

mtcars_tmp2 <- mtcars_tmp %>%
    filter(gear>3 & wt>3)

mtcars3 <- mtcars_tmp %>%
    filter(gear<=3 | wt<=3)

mtcars1 <- mtcars_tmp2 %>%
    filter(mpg>23)

mtcars4 <- mtcars_tmp2 %>%
    filter(mpg<=23)

mtcars_final = bind_rows(mtcars1, mtcars2, mtcars3, mtcars4)

CodePudding user response:

This will create the complement using anti_join and assigns it to a new object and returns the filtered results so it can be used just as a normal dplyr::filter in the pipe:

library(tidyverse)

#' Filters a data.frame and saves the complement
#' @param keep_complement charachter to name the object the complement is saved to. NULL to not save it.
filter_complement <- function(.data, ..., keep_complement = NULL) {
  res <- dplyr::filter(.data = .data, ...)
  
  if(! is.null(keep_complement)) {
    complement <- dplyr::anti_join(.data, res)
    assign(keep_complement, complement, envir = globalenv())  
  }
  
  res
}

mtcars %>%
  filter(cyl < 5) %>%
  filter(gear > 3 & wt > 3) %>%
  filter(mpg > 23)
#>            mpg cyl  disp hp drat   wt qsec vs am gear carb
#> Merc 240D 24.4   4 146.7 62 3.69 3.19   20  1  0    4    2

mtcars %>%
  filter_complement(cyl < 5, keep_complement = "mtcars2") %>%
  filter_complement(gear > 3 & wt > 3, keep_complement = "mtcars3") %>%
  filter_complement(mpg > 23, keep_complement = "mtcars4")
#> Joining, by = c("mpg", "cyl", "disp", "hp", "drat", "wt", "qsec", "vs", "am",
#> "gear", "carb")
#> Joining, by = c("mpg", "cyl", "disp", "hp", "drat", "wt", "qsec", "vs", "am",
#> "gear", "carb")
#> Joining, by = c("mpg", "cyl", "disp", "hp", "drat", "wt", "qsec", "vs", "am",
#> "gear", "carb")
#>            mpg cyl  disp hp drat   wt qsec vs am gear carb
#> Merc 240D 24.4   4 146.7 62 3.69 3.19   20  1  0    4    2

mtcars4
#>           mpg cyl  disp hp drat   wt qsec vs am gear carb
#> Merc 230 22.8   4 140.8 95 3.92 3.15 22.9  1  0    4    2
mtcars3
#>                 mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#> Datsun 710     22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
#> Fiat 128       32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
#> Honda Civic    30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
#> Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
#> Toyota Corona  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
#> Fiat X1-9      27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
#> Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
#> Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
#> Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

Created on 2022-05-06 by the reprex package (v2.0.0)

  • Related