Home > database >  Convert to factor if column is numeric and has less than 5 distinct values?
Convert to factor if column is numeric and has less than 5 distinct values?

Time:08-04

Change column to factor only if it is numeric and has less than 5 distinct values. How can I do this in r dplyr?

I know I should use mutate_if and is.numeric(), but not sure how to put everything together.

df |> mutate_if(<code here>, as.factor)

CodePudding user response:

There may be more elegant solutions, but in base r you could create a logical vector meeting the criteria then apply as.factor to those:

Data (with 2 numerics and only 1 of them meeting both conditions (col2)

set.seed(123)
n <- 100
df <- data.frame(col1 = paste0(sample(LETTERS, n, replace = TRUE),
                                     sample(LETTERS, n, replace = TRUE)),
                 col2  = sample(2005:2008, n, replace = TRUE),
                 col3 = runif(n))

Code

# logical vector
conv <- sapply(df, function(x) is.numeric(x) && length(unique(x)) < 5)

# Convert to factors
df[conv] <- lapply(df[conv], as.factor)

Test

str(df)
# 'data.frame': 100 obs. of  3 variables:
# $ col1: chr  "OO" "SU" "NE" "CH" ...
# $ col2: Factor w/ 4 levels "2005","2006",..: 4 3 1 1 4 4 2 1 4 2 ...
# $ col3: num  0.225 0.486 0.37 0.983 0.388 ...

CodePudding user response:

mutate_if is now deprecated and across() is preferred. Using mtcars as sample data (which starts off all numeric):

mtcars |>
  mutate(across(where(\(x) n_distinct(x) < 5 & is.numeric(x)), factor)) |>
  ## looking at the top of the results:
  head() |>
  as_tibble()
# # A tibble: 6 × 11
#     mpg cyl    disp    hp  drat    wt  qsec vs    am    gear   carb
#   <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <fct> <fct> <fct> <dbl>
# 1  21   6       160   110  3.9   2.62  16.5 0     1     4         4
# 2  21   6       160   110  3.9   2.88  17.0 0     1     4         4
# 3  22.8 4       108    93  3.85  2.32  18.6 1     1     4         1
# 4  21.4 6       258   110  3.08  3.22  19.4 1     0     3         1
# 5  18.7 8       360   175  3.15  3.44  17.0 0     0     3         2
# 6  18.1 6       225   105  2.76  3.46  20.2 1     0     3         1

We can see that a few columns have been converted to factor.

  • Related