I want to convert the NAs in my factor variables into a string "None" that will be a level in my data set.
i have tried
for ( col in 1:ncol(data)){
class(data$col) == "factor"
data$col = addNA(data$col)
levels(data$col) <- c(levels(data$col), "None")
print(summary(data))
}
And i got this error
Unknown or uninitialised column: `col`.Unknown or uninitialised column: `col`.Error: Assigned data `addNA(cdata$col)` must be compatible with existing data.
x Existing data has 1000 rows.
x Assigned data has 0 rows.
i Only vectors of size 1 are recycled.
What is the problem in this way? What is the better way to do this for all factor columns at once rather that doing each column alone.
CodePudding user response:
We can loop across
the columns that are factor
, convert the NA
to "None" using fct_explicit_na
from forcats
library(dplyr)
library(forcats)
data <- data %>%
mutate(across(where(is.factor), ~ fct_explicit_na(., na_level = "None")))
In the for
loop, there are multiple issues
class(data$col) == "factor"
is checked, but it should be inside anif(...)
expressiondata$col
- is wrong as there are no column names withcol
as name, instead it would bedata[[col]]
summary(data)
can be checked outside thefor
loop
for (col in seq_along(data)){
if(class(data[[col]]) == "factor") {
data[[col]] = addNA(data[[col]])
levels(data[[col]]) <- c(levels(data[[col]]), "None")
}
}
print(summary(data))
CodePudding user response:
Here is an alternative way:
- identify which columns are factor
- Add "None" to the levels of each factor
- Replace NA's by "None":
Here is an example with a mock dataset:
# identify which is factor column
x <- sapply(df, is.factor)
df[, x] <- lapply(df[, x], function(.){
levels(.) <- c(levels(.), "None")
replace(., is.na(.), "None")
})
output:
a b c
<fct> <fct> <dbl>
1 1 None 2
2 None 3 NA
3 4 None NA
data:
df <- structure(list(a = structure(c(1L, NA, 2L), .Label = c("1", "4"
), class = "factor"), b = structure(c(NA, 1L, NA), .Label = "3", class = "factor"),
c = c(2, NA, NA)), row.names = c(NA, -3L), class = c("tbl_df",
"tbl", "data.frame"))