I have a dataset that looks like this:
Study_ID Gender SMI BMI
1 100 Male 45 19
2 200 Male 50 20
3 300 Female 60 25
4 400 Male 42 29
5 500 Female 38 32
6 600 Female 50 20
7 700 Male 35 29
8 800 Male 47 31
9 900 Female 65 25
I would like to create a new, binary variable called 'Sarcopenia', where a patient must meet certain criteria to be defined as 'Yes' sarcopenia, or 'No' sarcopenia. The criteria differs for males and females.
My desired output would look something like this:
Study_ID Gender SMI BMI Sarcopenia
1 100 Male 45 19 No
2 200 Male 50 20 No
3 300 Female 60 25 No
4 400 Male 42 29 Yes
5 500 Female 38 32 No
6 600 Female 50 20 Yes
7 700 Male 35 29 No
8 800 Male 47 31 Yes
9 900 Female 65 25 No
I've tried to create an if else ladder:
data$Sarcopenia<-
if (data$`SMI` < 41 & data$Gender == "Female") {
"Yes"
} else if (data$`SMI` < 53 & data$Gender == "Male" & data$BMI >= 25) {
"Yes"
} else if (data$`SMI` < 43 & data$Gender == "Male" & data$BMI < 25) {
"Yes"
} else {
"No"
}
But for some reason, it gives me 'No', for every single patient (even though I know some of them meet the criteria).
What could I be doing wrong?
Reproducible data:
data<-data.frame(Study_ID=c("100","200","300","400","500","600","700","800","900"),Gender=c("Male","Male","Female","Male","Female","Female","Male","Male","Female"),SMI=c("45","50","60","42","38","50","35","47","65"),BMI=c("19","20","25","29","32","20","29","31","25"))
CodePudding user response:
When running your attempt, R produces a warning: the condition has length > 1 and only the first element will be used
.
This is because data$SMI < 41 & data$Gender == "Female"
returns a vector with TRUE
or FALSE
for each person in the dataset:
> data$SMI < 41 & data$Gender == "Female"
[1] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
In R, if
is not vectorised, so it doesn't understand this input and just uses the first element of the vector (in your case FALSE
) and ignores the rest. This is why you see 'No' for everyone.
There is a different function ifelse
which is vectorised, and will work just fine if you use that instead:
data = data.frame(Study_ID = 100*1:9,
Gender = c('Male', 'Male', 'Female', 'Male', 'Female', 'Female', 'Male', 'Male', 'Female'),
SMI = c(45, 50, 60,42,38,50,35,47,65),
BMI = c(19, 20, 25, 29, 32, 20, 29, 31, 25))
data$Sarcopenia<-
ifelse(data$`SMI` < 41 & data$Gender == "Female", "Yes",
ifelse(data$`SMI` < 53 & data$Gender == "Male" & data$BMI >= 25, "Yes",
ifelse(data$`SMI` < 43 & data$Gender == "Male" & data$BMI < 25, "Yes", "No")))
data
Study_ID Gender SMI BMI Sarcopenia
1 100 Male 45 19 No
2 200 Male 50 20 No
3 300 Female 60 25 No
4 400 Male 42 29 Yes
5 500 Female 38 32 Yes
6 600 Female 50 20 No
7 700 Male 35 29 Yes
8 800 Male 47 31 Yes
9 900 Female 65 25 No
CodePudding user response:
library(dplyr)
data$Sarcopenia <-
case_when(data$`SMI` < 41 & data$Gender == "Female" ~ "Yes",
data$`SMI` < 53 & data$Gender == "Male" & data$BMI >= 25 ~ "Yes",
data$`SMI` < 43 & data$Gender == "Male" & data$BMI < 25 ~ "Yes",
TRUE ~ "No")
Or without dplyr
using ifelse
(but that is not really an if-else ladder, and definitely less readable)
data$Sarcopenia <-
ifelse((data$`SMI` < 41 & data$Gender == "Female") |
(data$`SMI` < 53 & data$Gender == "Male" & data$BMI >= 25) |
(data$`SMI` < 43 & data$Gender == "Male" & data$BMI < 25), "Yes", "No")
CodePudding user response:
You should use vectorized ifelse()
instead of if...else...
.
transform(data,
Sarcopenia = ifelse(Gender == "Female" & SMI < 41 |
Gender == "Male" & (SMI < 53 & BMI >= 25 | SMI < 43 & BMI < 25),
"Yes", "No")
)