I'm new to R and I'm stuck. I'm working on a health dataset with each row as one patient's information.
I have a variable called diag_codes. It has the patient's medical condition in the form of a diagnostic code/number. I want to group the individual condition codes into broader categories (heart disease, resp disease, liver disease) and make that a new variable.
E.g. I know that 1,2,3,4,84 are all respiratory diseases. I also know that 5, 6, 7, 32, 56 are all cardiovascular diseases. I want to create a new variable called diagnosis.
diag_code | diagnosis |
---|---|
1 | "resp disease" |
2 | "resp disease" |
56 | "CVD disease" |
3 | "resp disease" |
4 | "resp disease" |
84 | "resp disease" |
5 | "CVD disease" |
6 | "CVD disease" |
7 | "CVD disease" |
32 | "CVD disease" |
I have tried to use case_when() and mutate(), or ifelse() and mutate(), but they usually involve a single true or false condition.
I want to be able to do something like this (I know this is incorrect):
data <- data %>%
mutate(diagnosis = case_when(diag_code==c(1,2,3,5,84)) ~ "Resp disease",
case_when(diag_code==c(5,6,7,32,56)) ~ "CVD disease",
TRUE ~ "Unknown)
CodePudding user response:
There are two things that you need to correct to make it work:
First, you can use only one case_when()
statement and second, when you want to evaluate a vector you can use %in%
instead of ==
. This then should look like this:
data <- data %>%
mutate(diagnosis = case_when(diag_code %in% c(1,2,3,5,84) ~ "Resp disease",
diag_code %in% c(5,6,7,32,56) ~ "CVD disease",
TRUE ~ "Unknown)