HF$anaemia = as.factor(HF$anaemia)
HF$diabetes = factor(HF$diabetes,levels=c(0,1),labels=c("Absent","Present"))
HF$hypertension = factor(HF$high_blood_pressure,levels=c(0,1),labels=c("Absent","Present"))
HF$sex = factor(HF$sex,levels=c(0,1),labels=c("Female","Male"))
HF$smoking = factor(HF$smoking,levels=c(0,1),labels=c("No","Yes"))
HF$DEATH_EVENT = as.factor(HF$DEATH_EVENT)
HF <- select(HF, -high_blood_pressure)
nrow(HF)
[1] 299
So this is the code I ran to adjust my variables before I began analyzing my current dataset. I'm at a point where I want to perform Cox Regression using My.stepwise::My.stepwise.coxph()
, which requires all my variables to be numeric-- the way they were before I ran the code above.
I don't have industry experience yet and would like to know if the proper way to adjust my variables back to numeric form would be to do so manually... or is making a copy of the original dataset good enough?
Could the answer possibly be that "yes, it'd be ideal to revert back manually with a large dataset but since this one only has 299 observations, it's fine to make a copy"?
CodePudding user response:
If we need the factor
columns to be integer
, an option is to loop over the factor
columns and use as.integer
HF2 <- HF
i1 <- sapply(HF, is.factor)
HF2[i1] <- lapply(HF, as.integer)
CodePudding user response:
With dplyr
library(tidyverse)
HF %>%
mutate(across(where(is.factor), as.numeric))