I am looking for a scalable version of the following.
Given:
library(tidyverse)
df <- data.frame(a = c(1, 2, 3, 4, 5, 3, 1),
b = c(6, 7, 8, 9, 10, 8, 9),
c = c(15, 12, 9, 2, 13, 9, 10))
df
# a b c
# 1 1 6 15
# 2 2 7 12
# 3 3 8 9
# 4 4 9 2
# 5 5 10 13
# 6 3 8 9
# 7 1 9 10
I want to create an indicator variable if any two combinations of the following criteria are valid:
a > 3, b > 8, c > 10
One option to do this manually is to use case_when
:
df %>%
mutate(indicator_variable = case_when(
a > 3 & b > 8 ~ 1,
a > 3 & c > 10 ~ 1,
b > 8 & c > 10 ~ 1,
TRUE ~ 0
))
# a b c indicator_variable
# 1 1 6 15 0
# 2 2 7 12 0
# 3 3 8 9 0
# 4 4 9 2 1
# 5 5 10 13 1
# 6 3 8 9 0
# 7 1 9 10 0
This works but obviously gets more tricky the more variables considered (choose(4, 2)
), change criteria to any two of four variables:
a > 3, b > 8, c > 10, d >= 5
df2 <- data.frame(a = c(1, 2, 3, 4, 5, 3, 1),
b = c(6, 7, 8, 9, 10, 8, 9),
c = c(15, 12, 9, 2, 13, 9, 10),
d = c(1, 2, 3, 4, 5, 6, 5))
df2
# a b c d
# 1 1 6 15 1
# 2 2 7 12 2
# 3 3 8 9 3
# 4 4 9 2 4
# 5 5 10 13 5
# 6 3 8 9 6
# 7 1 9 10 5
df2 %>%
mutate(indicator_variable = case_when(
a > 3 & b > 8 ~ 1,
a > 3 & c > 10 ~ 1,
a > 3 & d >= 5 ~ 1,
b > 8 & c > 10 ~ 1,
b > 8 & d >= 5 ~ 1,
c > 10 & d >= 5 ~ 1,
TRUE ~ 0
))
# a b c d indicator_variable
# 1 1 6 15 1 0
# 2 2 7 12 2 0
# 3 3 8 9 3 0
# 4 4 9 2 4 1
# 5 5 10 13 5 1
# 6 3 8 9 6 0
# 7 1 9 10 5 1
etc.
Best approaches? Doesn't have to involve case_when
, maybe combn
could be used?
thanks
CodePudding user response:
No need to use case_when
here, just make use of the fact that TRUE
converts to 1
in arithmetic operations and FALSE
to 0
.
library(dplyr)
df <- tibble(a = c(1, 2, 3, 4, 5, 3, 1),
b = c(6, 7, 8, 9, 10, 8, 9),
c = c(15, 12, 9, 2, 13, 9, 10))
df %>%
mutate(indicator = as.numeric(((a > 3) (b > 8) (c > 10)) >= 2))
#> # A tibble: 7 × 4
#> a b c indicator
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1 6 15 0
#> 2 2 7 12 0
#> 3 3 8 9 0
#> 4 4 9 2 1
#> 5 5 10 13 1
#> 6 3 8 9 0
#> 7 1 9 10 0
CodePudding user response:
Create a function that takes a input for operator, value and dataset
f1 <- function(dat, val, op) op(dat, val)
Then, create two list
s` with values and operator (if the operators are same, we don't need it though)
lst1 <- list(a = 3, b = 8, c = 10, d = 5)
lst2 <- list(a = `>`, b = `>`, c = `>`, d = `>=`)
Now, we do the combn
with Reduce
(using the second data 'df2')
df2$indicator_variable <- (Reduce(`|`, combn(lst1, 2, function(x)
Reduce(`&`, Map(f1, dat = df2[names(x)], val = lst1[names(x)],
op = lst2[names(x)])), simplify = FALSE)))
-output
> df2
a b c d indicator_variable
1 1 6 15 1 0
2 2 7 12 2 0
3 3 8 9 3 0
4 4 9 2 4 1
5 5 10 13 5 1
6 3 8 9 6 0
7 1 9 10 5 1
Or if we want to update the first data
lst1 <- list(a = 3, b = 8, c = 10)
lst2 <- list(a = `>`, b = `>`, c = `>`)
df$indicator_variable <- (Reduce(`|`, combn(lst1, 2, function(x)
Reduce(`&`, Map(f1, dat = df[names(x)], val = lst1[names(x)],
op = lst2[names(x)])), simplify = FALSE)))
> df
a b c indicator_variable
1 1 6 15 0
2 2 7 12 0
3 3 8 9 0
4 4 9 2 1
5 5 10 13 1
6 3 8 9 0
7 1 9 10 0