I have 5 columns with numerical data and I would like to filter for rows that match a data range in at least 3 of the 5 columns.

For example i have the following data frame and I define a value range of 5-10. My first row has 3 columns with values between 5 and 10, so i want to keep that row. The second row only has 2 values between 5 and 10, so I want to remove it.

column1	column2	column3	column4	column5
7	4	10	9	2
4	8	2	6	2

CodePudding user response：

First test if values in columns are greater or equal 5 and less or equal than 10, then look for rows with 3 or more that fit the condition.

dat[ rowSums( dat >= 5 & dat <= 10 ) >= 3, ]
  column1 column2 column3 column4 column5
1       7       4      10       9       2

Data

dat <- structure(list(column1 = c(7L, 4L), column2 = c(4L, 8L), column3 = c(10L, 
2L), column4 = c(9L, 6L), column5 = c(2, 2)), class = "data.frame", row.names = c(NA, 
-2L))

CodePudding user response：

I'd like to share a second approach:

# Setting up data
my_df <- tibble::tibble(A = c(7,4), B = c(4,8), C = c(10, 2), D = c(9,6), E = c(2,2), X = c("some", "character"))
my_min <- 5
my_max <- 10

Then do some tidyverse-magic:

# This is verbose, but shows clearly all the steps involved:
my_df_filtered <- my_df %>% 
  dplyr::mutate(n_cols_in_range = dplyr::across(where(is.numeric), ~ .x >= my_min & .x <= my_max)
                ) %>%
  dplyr::rowwise() %>%
  dplyr::mutate(n_cols_in_range = sum(n_cols_in_range, na.rm = TRUE)
                ) %>%
  dplyr::filter(n_cols_in_range >= 3
                ) %>%
  dplyr::select(-n_cols_in_range)

The above is equivalent to:

my_df_filtered <- my_df %>% 
  dplyr::rowwise() %>%
  dplyr::filter(sum(dplyr::across(where(is.numeric), ~ .x >= my_min & .x <= my_max), na.rm = TRUE) >= 3)

But I must state, that the above answer is clearly more elegant since it only needs 1 line of code!