Home > Mobile >  filter on multiple conditions simultaneously in R
filter on multiple conditions simultaneously in R

Time:10-29

I have

df = structure(list(`Q4-21` = c(0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 
1L, 0L, 1L, 0L, 1L, 0L, 1L), `Q1-22` = c(0L, 0L, 1L, 1L, 0L, 
0L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 1L), `Q2-22` = c(0L, 
0L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L), 
    `Q3-22` = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L), Name = c("A", "B", "C", "D", "E", "F", "G", 
    "H", "I", "J", "K", "L", "M", "N", "O", "P")), row.names = c(NA, 
-16L), class = "data.frame")

enter image description here

I want

  • to filter where columns 3,4, and 5 are not simultaneously 0
  • to refer to each column by position, as column names will change repeatedly

I have tried:

cols_of_interest = colnames(df)[2:4]
df %>%
  filter_at(which(colnames(df) %in% cols_of_interest), all_vars(. !=0))

but this filters on each column separately not being 0

I need to filter out rows A and B

Am aware of df[df[,2] !=0 | df[,3] !=0 | df[,4]!= 0,], but would prefer tidyverse method

Any suggestions?

CodePudding user response:

Use if_all:

library(dplyr)
df %>% 
  filter(!if_all(2:4, ~ .x == 0))

CodePudding user response:

Use rowSums.

df[rowSums(df[2:4]) != 0, ]

Or

subset(df, rowSums(df[2:4]) != 0)

#    Q4-21 Q1-22 Q2-22 Q3-22 Name
# 3      0     1     0     0    C
# 4      1     1     0     0    D
# 5      0     0     1     0    E
# 6      1     0     1     0    F
# 7      0     1     1     0    G
# 8      1     1     1     0    H
# 9      0     0     0     1    I
# 10     1     0     0     1    J
# 11     0     1     0     1    K
# 12     1     1     0     1    L
# 13     0     0     1     1    M
# 14     1     0     1     1    N
# 15     0     1     1     1    O
# 16     1     1     1     1    P

CodePudding user response:

A possible approach would be to calculate a sum of these 3 columns and then filter the rows whose sum is greater than 0, with the following code:

    # in a single line of code
     filter(df, rowSums(df[,cols_of_interest]) > 0)

The same, but in several lines and with apply (keeping track of the col' created for filter out) =>

     df$sum_of_3_cols = apply(df[,cols_of_interest], 
                        MARGIN = 1, FUN = sum, na.rm = T) 
     # ↑ compute a sum of these 3 col of interest

     df %>% filter(sum_of_3_cols > 0 ) %>% select(-sum_of_3_cols)
      # ↑ filter out                       ↑ remove the column used to filter

CodePudding user response:

You can use the pmax:

library(dplyr)

df %>%
  filter(do.call(pmax, .[2:4])>0)
  
   Q4-21 Q1-22 Q2-22 Q3-22 Name
1      0     1     0     0    C
2      1     1     0     0    D
3      0     0     1     0    E
4      1     0     1     0    F
5      0     1     1     0    G
6      1     1     1     0    H
7      0     0     0     1    I
8      1     0     0     1    J
9      0     1     0     1    K
10     1     1     0     1    L
11     0     0     1     1    M
12     1     0     1     1    N
13     0     1     1     1    O
14     1     1     1     1    P
  •  Tags:  
  • r
  • Related