Home > Software engineering >  How to create an if else ladder?
How to create an if else ladder?

Time:06-24

I have a dataset that looks like this:

   Study_ID Gender SMI BMI
1      100   Male  45  19
2      200   Male  50  20
3      300 Female  60  25
4      400   Male  42  29
5      500 Female  38  32
6      600 Female  50  20
7      700   Male  35  29
8      800   Male  47  31
9      900 Female  65  25

I would like to create a new, binary variable called 'Sarcopenia', where a patient must meet certain criteria to be defined as 'Yes' sarcopenia, or 'No' sarcopenia. The criteria differs for males and females.

My desired output would look something like this:

  Study_ID Gender SMI BMI Sarcopenia
1      100   Male  45  19         No
2      200   Male  50  20         No
3      300 Female  60  25         No
4      400   Male  42  29        Yes
5      500 Female  38  32         No
6      600 Female  50  20        Yes
7      700   Male  35  29         No
8      800   Male  47  31        Yes
9      900 Female  65  25         No

I've tried to create an if else ladder:

data$Sarcopenia<-
if (data$`SMI` < 41 & data$Gender == "Female") {
"Yes"
} else if (data$`SMI` < 53 & data$Gender == "Male" & data$BMI >= 25) {
"Yes"
} else if (data$`SMI` < 43 & data$Gender == "Male" & data$BMI < 25) {
"Yes"
} else {
"No"
}

But for some reason, it gives me 'No', for every single patient (even though I know some of them meet the criteria).

What could I be doing wrong?

Reproducible data:

data<-data.frame(Study_ID=c("100","200","300","400","500","600","700","800","900"),Gender=c("Male","Male","Female","Male","Female","Female","Male","Male","Female"),SMI=c("45","50","60","42","38","50","35","47","65"),BMI=c("19","20","25","29","32","20","29","31","25"))

CodePudding user response:

When running your attempt, R produces a warning: the condition has length > 1 and only the first element will be used.

This is because data$SMI < 41 & data$Gender == "Female" returns a vector with TRUE or FALSE for each person in the dataset:

> data$SMI < 41 & data$Gender == "Female"
[1] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE

In R, if is not vectorised, so it doesn't understand this input and just uses the first element of the vector (in your case FALSE) and ignores the rest. This is why you see 'No' for everyone.

There is a different function ifelse which is vectorised, and will work just fine if you use that instead:

data = data.frame(Study_ID = 100*1:9, 
                  Gender = c('Male', 'Male', 'Female', 'Male', 'Female', 'Female', 'Male', 'Male', 'Female'), 
                  SMI = c(45, 50, 60,42,38,50,35,47,65), 
                  BMI = c(19, 20, 25, 29, 32, 20, 29, 31, 25))

data$Sarcopenia<-
ifelse(data$`SMI` < 41 & data$Gender == "Female", "Yes",
ifelse(data$`SMI` < 53 & data$Gender == "Male" & data$BMI >= 25, "Yes",
ifelse(data$`SMI` < 43 & data$Gender == "Male" & data$BMI < 25, "Yes", "No")))

data
  Study_ID Gender SMI BMI Sarcopenia
1      100   Male  45  19         No
2      200   Male  50  20         No
3      300 Female  60  25         No
4      400   Male  42  29        Yes
5      500 Female  38  32        Yes
6      600 Female  50  20         No
7      700   Male  35  29        Yes
8      800   Male  47  31        Yes
9      900 Female  65  25         No

CodePudding user response:

library(dplyr)

data$Sarcopenia <-
  case_when(data$`SMI` < 41 & data$Gender == "Female" ~ "Yes", 
            data$`SMI` < 53 & data$Gender == "Male" & data$BMI >= 25 ~ "Yes",
            data$`SMI` < 43 & data$Gender == "Male" & data$BMI < 25 ~ "Yes", 
            TRUE ~ "No")

Or without dplyr using ifelse (but that is not really an if-else ladder, and definitely less readable)

data$Sarcopenia <- 
  ifelse((data$`SMI` < 41 & data$Gender == "Female") |
           (data$`SMI` < 53 & data$Gender == "Male" & data$BMI >= 25) | 
           (data$`SMI` < 43 & data$Gender == "Male" & data$BMI < 25), "Yes", "No")

CodePudding user response:

You should use vectorized ifelse() instead of if...else....

transform(data,
  Sarcopenia = ifelse(Gender == "Female" &  SMI < 41 |
                      Gender ==   "Male" & (SMI < 53 & BMI >= 25 | SMI < 43 & BMI < 25),
                      "Yes", "No")
)
  • Related