Let's take the following data
data <- tibble(a = letters[1:4], b1 = 10:13, b2 = 10:13,
b3 = c(9, 4, 11, 10), b4 = c(50, 10:12))
min <- 9
max <- 15
data
# A tibble: 4 × 5
a b1 b2 b3 b4
<chr> <int> <int> <dbl> <dbl>
1 a 10 10 9 50
2 b 11 11 4 10
3 c 12 12 11 11
4 d 13 13 10 12
I want to create a new column is_unexpected
to control if a value is outside of min
& max
range.
Of course, the slow (in the sense of writing the code) way to do it is the one below. Since I have more than 4 columns (which all begin with "b"), I'm looking to optimise my code. I've tried playing with across()
but unsuccessfully.
data %>%
mutate(is_unexpected =
if_else(b1 < min | b2 < min | b3 < min | b4 < min |
b1 > max | b2 > max | b3 > max | b4 > max,
1, 0)
)
CodePudding user response:
With if_any
:
data %>%
mutate(is_unexpected = (if_any(b1:b4, ~ .x > max | .x < min)))
# A tibble: 4 × 6
a b1 b2 b3 b4 is_unexpected
<chr> <int> <int> <dbl> <dbl> <int>
1 a 10 10 9 50 1
2 b 11 11 4 10 1
3 c 12 12 11 11 0
4 d 13 13 10 12 0
Or with between
:
data %>%
mutate(is_unexpected = (if_any(b1:b4, ~ !between(.x, min, max))))
CodePudding user response:
Using across
data %>%
rowwise() %>%
mutate(is_unexpected = any(across(b1:b4) > !!max | across(b1:b4) < !!min) * 1)
# A tibble: 4 × 6
# Rowwise:
a b1 b2 b3 b4 is_unexpected
<chr> <int> <int> <dbl> <dbl> <dbl>
1 a 10 10 9 50 1
2 b 11 11 4 10 1
3 c 12 12 11 11 0
4 d 13 13 10 12 0
Using !!
here because if the data frame has a min or max variable it uses these instead of the global ones.
CodePudding user response:
Alternatively please check
data %>% mutate(across(where(is.numeric), ~ifelse(.x>min|.x>max, 1, 0), .names = 'new{col}'),
is_unexpected= ifelse(rowSums(across(starts_with('new')))>0,1,0)) %>%
select(-starts_with('new'))
Created on 2023-01-25 with reprex v2.0.2