Convert numerical variable to categorical variable-CodePudding

I have a list of columns that contain 0 and 1 as values. Right now they are treated as numerical variables but I want them to be treated as categorical.

I tried

as.factor(df[,"diseasesA":"diseaseM"], exclude = NULL)

but received the following error message:

Error in as.factor(df[,"diseasesA":"diseaseM"],  : 
  unused argument (exclude = NULL)

not using "exclude = NULL" gave me the following error message:

Error in "diseasesA":"diseaseM" : NA/NaN argument
In addition: Warning messages:
1: In eval(jsub, setattr(as.list(seq_along(x)), "names", names_x),  :
  NAs introduced by coercion
2: In eval(jsub, setattr(as.list(seq_along(x)), "names", names_x),  :
  NAs introduced by coercion

CodePudding user response：

factor() or as.factor() works on a single column, not a data frame. So you need to apply that function to the columns you want to convert. Here are a few equivalent methods:

cols = paste0("disease", LETTERS[1:13]) # assuming your naming pattern is consistent

## base R with lapply
df[cols] = lapply(df[cols], factor)

## base R with for loop
for(i in seq_along(cols)) {
  df[[i]] = factor(df[[i]])
}

## dplyr
library(dplyr)
df = df %>%
  mutate(across(diseaseA:diseaseM, factor))

I will note that your question is inconsistent in its column naming pattern, disease vs diseases. In the base R methods I assumed that's a typo and further assumed you wanted to convert columns diseaseA, diseaseB, diseaseC, ..., diseaseM. In dplyr we can use across() to use X:Z to operate on all columns starting with X through Z--but there are many other methods possible to select which columns to work on, e.g., starts_with("diesease").