I have 5 columns with numerical data and I would like to filter for rows that match a data range in at least 3 of the 5 columns.
For example i have the following data frame and I define a value range of 5-10. My first row has 3 columns with values between 5 and 10, so i want to keep that row. The second row only has 2 values between 5 and 10, so I want to remove it.
column1 | column2 | column3 | column4 | column5 |
---|---|---|---|---|
7 | 4 | 10 | 9 | 2 |
4 | 8 | 2 | 6 | 2 |
CodePudding user response:
First test if values in columns are greater or equal 5 and less or equal than 10, then look for rows with 3 or more that fit the condition.
dat[ rowSums( dat >= 5 & dat <= 10 ) >= 3, ]
column1 column2 column3 column4 column5
1 7 4 10 9 2
Data
dat <- structure(list(column1 = c(7L, 4L), column2 = c(4L, 8L), column3 = c(10L,
2L), column4 = c(9L, 6L), column5 = c(2, 2)), class = "data.frame", row.names = c(NA,
-2L))
CodePudding user response:
I'd like to share a second approach:
# Setting up data
my_df <- tibble::tibble(A = c(7,4), B = c(4,8), C = c(10, 2), D = c(9,6), E = c(2,2), X = c("some", "character"))
my_min <- 5
my_max <- 10
Then do some tidyverse-magic:
# This is verbose, but shows clearly all the steps involved:
my_df_filtered <- my_df %>%
dplyr::mutate(n_cols_in_range = dplyr::across(where(is.numeric), ~ .x >= my_min & .x <= my_max)
) %>%
dplyr::rowwise() %>%
dplyr::mutate(n_cols_in_range = sum(n_cols_in_range, na.rm = TRUE)
) %>%
dplyr::filter(n_cols_in_range >= 3
) %>%
dplyr::select(-n_cols_in_range)
The above is equivalent to:
my_df_filtered <- my_df %>%
dplyr::rowwise() %>%
dplyr::filter(sum(dplyr::across(where(is.numeric), ~ .x >= my_min & .x <= my_max), na.rm = TRUE) >= 3)
But I must state, that the above answer is clearly more elegant since it only needs 1 line of code!