Context
I have a dataframe df
including 3 variables a
, b
and c
(Continuous variables).
I want to convert the continuous variables a
, b
and c
to categorical variables a1
, b1
and c1
using cutoff value which is stored in df_cutoff
.
Now, I can do it with a low efficient way.
But in my real situation, I have much more variables like a1
, b1
and c1
need to be mutated.
I cannot mutate it one by one.
Question
Is there any ways that can mutate a1
, b1
and c1
more efficiently?
Reproducible code
df = data.frame(a = 1:5,
b = 6:10,
c = 11:15)
# > df
# a b c
# 1 1 6 11
# 2 2 7 12
# 3 3 8 13
# 4 4 9 14
# 5 5 10 15
df_cutoff = data.frame(var = c('a', 'b', 'c'),
cutoff = c(2.5, 7.5, 12.5))
# > df_cutoff
# var cutoff
# 1 a 2.5
# 2 b 7.5
# 3 c 12.5
a_cutoff = subset(df_cutoff, var == 'a')[,'cutoff', drop = TRUE]
b_cutoff = subset(df_cutoff, var == 'b')[,'cutoff', drop = TRUE]
c_cutoff = subset(df_cutoff, var == 'c')[,'cutoff', drop = TRUE]
df %>% mutate(a2 = ifelse(a <= a_cutoff, 'low', 'high'),
b2 = ifelse(b <= b_cutoff, 'low', 'high'),
c2 = ifelse(c <= c_cutoff, 'low', 'high'))
# a b c a2 b2 c2
# 1 1 6 11 low low low
# 2 2 7 12 low low low
# 3 3 8 13 high high high
# 4 4 9 14 high high high
# 5 5 10 15 high high high
CodePudding user response:
With across
, match
the cur_column
with df_cutoff$var
to get the cutoff and use ifelse
to assign the correct value.
library(dplyr)
df %>%
mutate(across(a:c, ~ ifelse(.x <= df_cutoff$cutoff[match(cur_column(), df_cutoff$var)],
'low', 'high'),
.names = "{col}2"))
# a b c a2 b2 c2
# 1 1 6 11 low low low
# 2 2 7 12 low low low
# 3 3 8 13 high high high
# 4 4 9 14 high high high
# 5 5 10 15 high high high
CodePudding user response:
Another approach:
cbind(df, df <= rep(df_cutoff$cutoff, each = nrow(df)))
# a b c a b c
# 1 1 6 11 TRUE TRUE TRUE
# 2 2 7 12 TRUE TRUE TRUE
# 3 3 8 13 FALSE FALSE FALSE
# 4 4 9 14 FALSE FALSE FALSE
# 5 5 10 15 FALSE FALSE FALSE