i'm trying to figure out how to write a loop that tests if a value in one of many columns is greater than or less than values in two set columns in a data frame. I'd like a 1/0 output and to drop all the columns that are tested. my solution has an embarrassing number of mutates to create new columns that are T or F and then uses a Reduce function to check if TRUE is present in one of the columns from a set position to the end of the data frame. any help on this would be appreciated!
example:
library(tidyverse)
df3 = data.frame(X = sample(1:3, 15, replace = TRUE),
Y = sample(1:3, 15, replace = TRUE),
Z = sample(1:3, 15, replace = TRUE),
A = sample(1:3, 15, replace = TRUE))
df3 <- df3 %>% mutate(T1 = Z >= X & Z <= Y,
T2 = A >= X & A <= Y)
df3$check <- Reduce(`|`, lapply(df3[5:6], `==`, TRUE))
CodePudding user response:
df3 %>%
mutate(check = if_any(Z:A, function(x) {x >= X & x <= Y}))
CodePudding user response:
You can compare the entire subset df3[c('A', 'Z')]
at once, which should be more efficient. We are looking for rowSums
greater than zero.
To understand the logic:
cols <- c('A', 'Z')
as.integer(rowSums(df3[cols] >= df3$X & df3[cols] <= df3$Y) > 0)
# [1] 1 1 0 0 1 0 0 1 0 0 1 1 0 0 1
To create the column:
transform(df3, check=as.integer(rowSums(df3[cols] >= X & df3[cols] <= Y) > 0))
# X Y Z A check
# 1 1 3 3 2 1
# 2 1 3 3 2 1
# 3 1 1 2 2 0
# 4 1 1 2 2 0
# 5 2 3 2 3 1
# 6 2 1 2 2 0
# 7 2 3 1 1 0
# 8 1 1 1 2 1
# 9 3 1 2 3 0
# 10 3 2 2 2 0
# 11 1 3 3 2 1
# 12 1 2 3 2 1
# 13 2 1 1 1 0
# 14 2 2 1 1 0
# 15 2 2 2 1 1
Data:
dat <- structure(list(X = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 3L, 3L,
1L, 1L, 2L, 2L, 2L), Y = c(3L, 3L, 1L, 1L, 3L, 1L, 3L, 1L, 1L,
2L, 3L, 2L, 1L, 2L, 2L), Z = c(3L, 3L, 2L, 2L, 2L, 2L, 1L, 1L,
2L, 2L, 3L, 3L, 1L, 1L, 2L), A = c(2L, 2L, 2L, 2L, 3L, 2L, 1L,
2L, 3L, 2L, 2L, 2L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA,
-15L))