I have a list of columns that contain 0 and 1 as values. Right now they are treated as numerical variables but I want them to be treated as categorical.
I tried
as.factor(df[,"diseasesA":"diseaseM"], exclude = NULL)
but received the following error message:
Error in as.factor(df[,"diseasesA":"diseaseM"], :
unused argument (exclude = NULL)
not using "exclude = NULL"
gave me the following error message:
Error in "diseasesA":"diseaseM" : NA/NaN argument
In addition: Warning messages:
1: In eval(jsub, setattr(as.list(seq_along(x)), "names", names_x), :
NAs introduced by coercion
2: In eval(jsub, setattr(as.list(seq_along(x)), "names", names_x), :
NAs introduced by coercion
CodePudding user response:
factor()
or as.factor()
works on a single column, not a data frame. So you need to apply that function to the columns you want to convert. Here are a few equivalent methods:
cols = paste0("disease", LETTERS[1:13]) # assuming your naming pattern is consistent
## base R with lapply
df[cols] = lapply(df[cols], factor)
## base R with for loop
for(i in seq_along(cols)) {
df[[i]] = factor(df[[i]])
}
## dplyr
library(dplyr)
df = df %>%
mutate(across(diseaseA:diseaseM, factor))
I will note that your question is inconsistent in its column naming pattern, disease
vs diseases
. In the base R methods I assumed that's a typo and further assumed you wanted to convert columns diseaseA
, diseaseB
, diseaseC
, ..., diseaseM
. In dplyr
we can use across()
to use X:Z
to operate on all columns starting with X
through Z
--but there are many other methods possible to select which columns to work on, e.g., starts_with("diesease")
.