Change column to factor only if it is numeric and has less than 5 distinct values.
How can I do this in r dplyr
?
I know I should use mutate_if
and is.numeric()
, but not sure how to put everything together.
df |> mutate_if(<code here>, as.factor)
CodePudding user response:
There may be more elegant solutions, but in base r you could create a logical vector meeting the criteria then apply as.factor
to those:
Data (with 2 numerics and only 1 of them meeting both conditions (col2
)
set.seed(123)
n <- 100
df <- data.frame(col1 = paste0(sample(LETTERS, n, replace = TRUE),
sample(LETTERS, n, replace = TRUE)),
col2 = sample(2005:2008, n, replace = TRUE),
col3 = runif(n))
Code
# logical vector
conv <- sapply(df, function(x) is.numeric(x) && length(unique(x)) < 5)
# Convert to factors
df[conv] <- lapply(df[conv], as.factor)
Test
str(df)
# 'data.frame': 100 obs. of 3 variables:
# $ col1: chr "OO" "SU" "NE" "CH" ...
# $ col2: Factor w/ 4 levels "2005","2006",..: 4 3 1 1 4 4 2 1 4 2 ...
# $ col3: num 0.225 0.486 0.37 0.983 0.388 ...
CodePudding user response:
mutate_if
is now deprecated and across()
is preferred. Using mtcars
as sample data (which starts off all numeric):
mtcars |>
mutate(across(where(\(x) n_distinct(x) < 5 & is.numeric(x)), factor)) |>
## looking at the top of the results:
head() |>
as_tibble()
# # A tibble: 6 × 11
# mpg cyl disp hp drat wt qsec vs am gear carb
# <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <fct> <fct> <fct> <dbl>
# 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
# 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
# 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
# 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
# 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
# 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
We can see that a few columns have been converted to factor.