Home > Mobile >  dplyr detect value outside range across multiple columns
dplyr detect value outside range across multiple columns

Time:01-25

Let's take the following data

data <- tibble(a = letters[1:4], b1 = 10:13, b2 = 10:13, 
               b3 = c(9, 4, 11, 10), b4 = c(50, 10:12))
min <- 9
max <- 15

data
# A tibble: 4 × 5
  a        b1    b2    b3    b4
  <chr> <int> <int> <dbl> <dbl>
1 a        10    10     9    50
2 b        11    11     4    10
3 c        12    12    11    11
4 d        13    13    10    12

I want to create a new column is_unexpected to control if a value is outside of min & max range.

Of course, the slow (in the sense of writing the code) way to do it is the one below. Since I have more than 4 columns (which all begin with "b"), I'm looking to optimise my code. I've tried playing with across() but unsuccessfully.

data %>% 
  mutate(is_unexpected = 
           if_else(b1 < min | b2 < min | b3 < min | b4 < min | 
                     b1 > max | b2 > max | b3 > max | b4 > max,
                   1, 0)
  )

CodePudding user response:

With if_any:

data %>% 
  mutate(is_unexpected =  (if_any(b1:b4, ~ .x > max | .x < min)))

# A tibble: 4 × 6
  a        b1    b2    b3    b4 is_unexpected
  <chr> <int> <int> <dbl> <dbl>         <int>
1 a        10    10     9    50             1
2 b        11    11     4    10             1
3 c        12    12    11    11             0
4 d        13    13    10    12             0

Or with between:

data %>% 
  mutate(is_unexpected =  (if_any(b1:b4, ~ !between(.x, min, max))))

CodePudding user response:

Using across

data %>% 
  rowwise() %>% 
  mutate(is_unexpected = any(across(b1:b4) > !!max | across(b1:b4) < !!min) * 1)
# A tibble: 4 × 6
# Rowwise: 
  a        b1    b2    b3    b4 is_unexpected
  <chr> <int> <int> <dbl> <dbl>         <dbl>
1 a        10    10     9    50             1
2 b        11    11     4    10             1
3 c        12    12    11    11             0
4 d        13    13    10    12             0

Using !! here because if the data frame has a min or max variable it uses these instead of the global ones.

CodePudding user response:

Alternatively please check

data %>% mutate(across(where(is.numeric), ~ifelse(.x>min|.x>max, 1, 0), .names = 'new{col}'),
 is_unexpected= ifelse(rowSums(across(starts_with('new')))>0,1,0)) %>% 
select(-starts_with('new'))

Created on 2023-01-25 with reprex v2.0.2

  • Related