Is there a way to use if_all less stringently in the filter function of dplyr
?
# make this example reproducible
set.seed(1)
# create a data frame with 10 random numbers between 1 and 20
df <- as.data.frame(matrix(runif(n=25, min=1, max=20), nrow=5))
# define column names
names(df) <- c('A.length', 'b.length','C.length','D.length','E.length')
> df
A.length b.length C.length D.length E.length
1 6.044665 18.069404 4.913517 10.456286 18.759399
2 8.070354 18.948830 4.354578 14.634752 5.030708
3 11.884214 13.555158 14.053434 19.846216 13.381802
4 18.255948 12.953167 8.297971 8.220668 3.385547
5 4.831957 2.173939 15.626987 15.771459 6.077193
I want to use a filter function to filter rows where all columns or all columns except 1 are above a specific value.
Something like the following but row number 3 should be included since only 1 column doesn't fulfill the criteria.
df %>%
filter(if_all(ends_with("length"), ~ .x > 13))
# [1] A.length b.length C.length D.length E.length
# <0 rows> (or 0-length row.names)
CodePudding user response:
Maybe if_any
and if_all
cannot fit your case. A workaround is to use rowSums(across(...)) >= n
to determine whether there are n
or more values greater than 13 in each row.
df %>%
filter(rowSums(across(ends_with("length")) > 13) >= 4)
# A.length b.length C.length D.length E.length
# 1 11.88421 13.55516 14.05343 19.84622 13.3818
The 3rd row are extracted from df
:
df
# A.length b.length C.length D.length E.length
# 1 6.044665 18.069404 4.913517 10.456286 18.759399
# 2 8.070354 18.948830 4.354578 14.634752 5.030708
# 3 11.884214 13.555158 14.053434 19.846216 13.381802
# 4 18.255948 12.953167 8.297971 8.220668 3.385547
# 5 4.831957 2.173939 15.626987 15.771459 6.077193
CodePudding user response:
This may not be the best of the approach but an alternate
library(tidyverse)
df %>% rowwise %>%
mutate(across(ends_with('length'), ~ ifelse(.x>13,1,NA), .names = 'log{col}'),
sumx=sum(across(starts_with('log')), na.rm = T)
) %>% filter(sumx>=4) %>% select(-starts_with('log'),-sumx)
Created on 2023-01-19 with reprex v2.0.2
# A tibble: 1 × 5
# Rowwise:
A.length b.length C.length D.length E.length
<dbl> <dbl> <dbl> <dbl> <dbl>
1 11.9 13.6 14.1 19.8 13.4