Home > Mobile >  Create a factor variable based on conditions across multiple columns in R
Create a factor variable based on conditions across multiple columns in R

Time:06-29

Data:

df1

I want to create a new factor variable (CH) 1/0. The condition is if any of the Sys variables in the row are >= 140 OR if any of the Dia variables in the row are >= 90, then CH should be 1.

My real dataset has 25 values for Sys variables and 25 values for Dia variables.

I have tried using ifelse -

df1$CH <- if_else(df1$Sys1 >= 140 | df1$Sys2 >= 140 | df1$Sys3 >= 140 | df1$Sys4 >= 140 | df1$Sys5 >= 140 |
                        df1$Sys6 >= 140 | df1$Dia1 <= 90 | df1$Dia2 <= 90 | df1$Dia3 <= 90 | df1$Dia4 <= 90 |
                        df1$Dia5 <= 90 | df1$Dia6 <= 90, 1,0)

But this has not worked. It also takes a long time to write with the number of variables in my real dataset.

What is the quick and accurate way to do this?

Any help would be greatly appreciated, I haven't found an answer on any preexisting similar questions.

CodePudding user response:

Here's a tidy solution:

library(dplyr)
library(tidyr)
dat <- expand.grid(id = 1:3, 
           num=1:6)
dat$Sys <- NA
dat$Sys[which(dat$id == 1)] <- runif(6, 10, 100)
dat$Sys[which(dat$id != 1)] <- runif(12, 110, 145)
dat$Dia <- NA
dat$Dia[which(dat$id == 1)] <- runif(6, 91, 125)
dat$Dia[which(dat$id != 1)] <- runif(12, 70, 95)

dat <- dat %>% pivot_wider(values_from=c("Sys", "Dia"), 
                    names_from="num", 
                    names_sep="")

dat %>%
  rowwise() %>% 
  mutate(CH = case_when(any(c_across(contains("Sys")) >= 140) | 
                          any(c_across(contains("Dia")) <= 90) ~ 1,
                        TRUE ~ -0))
#> # A tibble: 3 × 14
#> # Rowwise: 
#>      id  Sys1  Sys2  Sys3  Sys4  Sys5  Sys6  Dia1  Dia2  Dia3  Dia4  Dia5  Dia6
#>   <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1  71.3  78.2  47.4  53.6  67.0  47.5 107.  114.  106.  112.  104.  108. 
#> 2     2 114.  113.  125.  142.  142.  116.   71.8  82.2  73.4  75.8  70.4  93.1
#> 3     3 144.  136.  118.  112.  133.  126.   77.6  88.2  85.6  91.6  75.9  77.9
#> # … with 1 more variable: CH <dbl>

Created on 2022-06-28 by the reprex package (v2.0.1)

  • Related