Home > Blockchain >  test values within data frame R
test values within data frame R

Time:10-18

i'm trying to figure out how to write a loop that tests if a value in one of many columns is greater than or less than values in two set columns in a data frame. I'd like a 1/0 output and to drop all the columns that are tested. my solution has an embarrassing number of mutates to create new columns that are T or F and then uses a Reduce function to check if TRUE is present in one of the columns from a set position to the end of the data frame. any help on this would be appreciated!

example:

library(tidyverse)

df3 = data.frame(X = sample(1:3, 15, replace = TRUE),
                 Y = sample(1:3, 15, replace = TRUE),
                 Z = sample(1:3, 15, replace = TRUE),
                 A = sample(1:3, 15, replace = TRUE))

df3 <- df3 %>% mutate(T1 = Z >= X & Z <= Y,
                      T2 = A >= X & A <= Y)

df3$check <- Reduce(`|`, lapply(df3[5:6], `==`, TRUE))

CodePudding user response:

df3 %>%
  mutate(check = if_any(Z:A, function(x) {x >= X & x <= Y}))

CodePudding user response:

You can compare the entire subset df3[c('A', 'Z')] at once, which should be more efficient. We are looking for rowSums greater than zero.

To understand the logic:

cols <- c('A', 'Z')
as.integer(rowSums(df3[cols] >= df3$X & df3[cols] <= df3$Y) > 0)
# [1] 1 1 0 0 1 0 0 1 0 0 1 1 0 0 1

To create the column:

transform(df3, check=as.integer(rowSums(df3[cols] >= X & df3[cols] <= Y) > 0))
#    X Y Z A check
# 1  1 3 3 2     1
# 2  1 3 3 2     1
# 3  1 1 2 2     0
# 4  1 1 2 2     0
# 5  2 3 2 3     1
# 6  2 1 2 2     0
# 7  2 3 1 1     0
# 8  1 1 1 2     1
# 9  3 1 2 3     0
# 10 3 2 2 2     0
# 11 1 3 3 2     1
# 12 1 2 3 2     1
# 13 2 1 1 1     0
# 14 2 2 1 1     0
# 15 2 2 2 1     1

Data:

dat <- structure(list(X = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 3L, 3L, 
1L, 1L, 2L, 2L, 2L), Y = c(3L, 3L, 1L, 1L, 3L, 1L, 3L, 1L, 1L, 
2L, 3L, 2L, 1L, 2L, 2L), Z = c(3L, 3L, 2L, 2L, 2L, 2L, 1L, 1L, 
2L, 2L, 3L, 3L, 1L, 1L, 2L), A = c(2L, 2L, 2L, 2L, 3L, 2L, 1L, 
2L, 3L, 2L, 2L, 2L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA, 
-15L))
  •  Tags:  
  • r
  • Related