Home > Net >  For loop converting NA in factor variables into "None"
For loop converting NA in factor variables into "None"

Time:09-21

I want to convert the NAs in my factor variables into a string "None" that will be a level in my data set.

i have tried

for ( col in 1:ncol(data)){
  class(data$col) == "factor"
  data$col = addNA(data$col)
  levels(data$col) <- c(levels(data$col), "None")
  print(summary(data))
}

And i got this error

Unknown or uninitialised column: `col`.Unknown or uninitialised column: `col`.Error: Assigned data `addNA(cdata$col)` must be compatible with existing data.
x Existing data has 1000 rows.
x Assigned data has 0 rows.
i Only vectors of size 1 are recycled.

What is the problem in this way? What is the better way to do this for all factor columns at once rather that doing each column alone.

CodePudding user response:

We can loop across the columns that are factor, convert the NA to "None" using fct_explicit_na from forcats

library(dplyr)
library(forcats)
data <- data %>%
     mutate(across(where(is.factor), ~ fct_explicit_na(., na_level = "None")))

In the for loop, there are multiple issues

  1. class(data$col) == "factor" is checked, but it should be inside an if(...) expression
  2. data$col - is wrong as there are no column names with col as name, instead it would be data[[col]]
  3. summary(data) can be checked outside the for loop
for (col in seq_along(data)){
  if(class(data[[col]]) == "factor") {
     data[[col]] = addNA(data[[col]])
     levels(data[[col]]) <- c(levels(data[[col]]), "None")    
   }
}

print(summary(data))

CodePudding user response:

Here is an alternative way:

  1. identify which columns are factor
  2. Add "None" to the levels of each factor
  3. Replace NA's by "None":

Here is an example with a mock dataset:

# identify which is factor column
x <-  sapply(df, is.factor) 

df[, x] <- lapply(df[, x], function(.){
    levels(.) <- c(levels(.), "None")
    replace(., is.na(.), "None")
})

output:

  a     b         c
  <fct> <fct> <dbl>
1 1     None      2
2 None  3        NA
3 4     None     NA

data:

df <- structure(list(a = structure(c(1L, NA, 2L), .Label = c("1", "4"
), class = "factor"), b = structure(c(NA, 1L, NA), .Label = "3", class = "factor"), 
c = c(2, NA, NA)), row.names = c(NA, -3L), class = c("tbl_df", 
"tbl", "data.frame"))
  •  Tags:  
  • r
  • Related