Should I revert factors to numeric form, manually?-CodePudding

HF$anaemia = as.factor(HF$anaemia)
HF$diabetes = factor(HF$diabetes,levels=c(0,1),labels=c("Absent","Present"))
HF$hypertension = factor(HF$high_blood_pressure,levels=c(0,1),labels=c("Absent","Present"))
HF$sex = factor(HF$sex,levels=c(0,1),labels=c("Female","Male"))
HF$smoking = factor(HF$smoking,levels=c(0,1),labels=c("No","Yes"))
HF$DEATH_EVENT = as.factor(HF$DEATH_EVENT)

HF <- select(HF, -high_blood_pressure)

nrow(HF)
[1] 299

So this is the code I ran to adjust my variables before I began analyzing my current dataset. I'm at a point where I want to perform Cox Regression using My.stepwise::My.stepwise.coxph(), which requires all my variables to be numeric-- the way they were before I ran the code above.

I don't have industry experience yet and would like to know if the proper way to adjust my variables back to numeric form would be to do so manually... or is making a copy of the original dataset good enough?

Could the answer possibly be that "yes, it'd be ideal to revert back manually with a large dataset but since this one only has 299 observations, it's fine to make a copy"?

CodePudding user response：

If we need the factor columns to be integer, an option is to loop over the factor columns and use as.integer

HF2 <- HF
i1 <- sapply(HF, is.factor)
HF2[i1] <- lapply(HF, as.integer)

CodePudding user response：

With dplyr

library(tidyverse)

HF %>% 
  mutate(across(where(is.factor), as.numeric))