Home > Software engineering >  How to use dplyr to filter rows where value in a specific column is 1 and all the rest are 0?
How to use dplyr to filter rows where value in a specific column is 1 and all the rest are 0?

Time:12-30

Using dplyr functions, I want to remove rows in which only column b equals 1 and the rest of columns are all 0.

Although I can do this:

library(dplyr, warn.conflicts = FALSE)

trb <-
  tribble(~a, ~b, ~c,
          1, 1, 1,
          1, 1, 0,
          1, 0, 1,
          0, 1, 0, # <~~~ remove this
          0, 0, 0,
          0, 1, 0  # <~~~ remove this
          )

trb %>%
  filter(!(b == 1 & a == 0 & c == 0))
#> # A tibble: 4 x 3
#>       a     b     c
#>   <dbl> <dbl> <dbl>
#> 1     1     1     1
#> 2     1     1     0
#> 3     1     0     1
#> 4     0     0     0

I'm looking for a more scalable solution to account for data such as:

trb_2 <-
  tibble::tribble(
    ~a, ~b, ~c, ~d, ~e, ~f, ~g, ~h, ~i, ~j, ~k, ~l, ~m, ~n, ~o, ~p, ~q, ~r, ~s, ~t, ~u, ~v, ~w, ~x, ~y, ~z,
    0,  0,  1,  0,  1,  1,  1,  0,  0,  0,  0,  1,  1,  0,  1,  0,  0,  1,  1,  0,  0,  1,  0,  0,  0,  0,
    1,  0,  1,  1,  1,  0,  1,  1,  1,  0,  0,  1,  1,  1,  1,  1,  0,  1,  1,  0,  1,  0,  0,  1,  1,  1,
    0,  1,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
    1,  1,  0,  0,  0,  0,  1,  1,  1,  0,  0,  1,  1,  1,  0,  1,  1,  0,  1,  1,  1,  1,  0,  1,  1,  1,
    0,  1,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
    1,  1,  0,  0,  0,  1,  0,  1,  1,  1,  1,  0,  0,  0,  1,  0,  0,  1,  1,  0,  0,  0,  0,  0,  0,  1,
    1,  0,  0,  0,  1,  0,  1,  1,  0,  0,  0,  1,  0,  1,  0,  0,  1,  0,  0,  1,  1,  0,  0,  0,  0,  0,
    0,  1,  1,  0,  0,  0,  0,  1,  1,  1,  1,  0,  1,  1,  1,  0,  1,  0,  1,  0,  1,  0,  0,  0,  0,  0,
    0,  0,  1,  0,  1,  0,  1,  1,  1,  0,  1,  1,  1,  0,  0,  1,  0,  0,  0,  1,  1,  1,  0,  1,  1,  0,
    0,  0,  0,  1,  0,  0,  0,  1,  0,  1,  1,  0,  1,  0,  0,  0,  0,  1,  0,  0,  1,  1,  0,  0,  1,  1
  )

In trb_2 I still want to remove the rows in which b equals 1 and all the rest are 0.


Is there a scalable way to achieve this using dplyr::filter()?

CodePudding user response:

Yes, using the new helper function dplyr::if_all() you can do this for no matter how many columns you have:

trb %>% 
  filter(!(b == 1 & if_all(-b, ~ .x == 0)))

Result:

# A tibble: 4 x 3
      a     b     c
  <dbl> <dbl> <dbl>
1     1     1     1
2     1     1     0
3     1     0     1
4     0     0     0

Breakdown of !(b == 1 & if_all(-b, ~ .x == 0)):

  • b == 1 will match rows where b is 1
  • if_all(-b, ~ .x == 0) will match rows where all columns except b are exactly 0
  • !(b == 1 & if_all(-b, ~ .x == 0)) combines these two expressions and removes the rows where both are true

CodePudding user response:

trb %>%
  filter(b != 1 | rowSums(. == 1) != 1)
# # A tibble: 4 x 3
#       a     b     c
#   <dbl> <dbl> <dbl>
# 1     1     1     1
# 2     1     1     0
# 3     1     0     1
# 4     0     0     0
  • Related