Home > Mobile >  How to use mutate vectorizely from dplyr in R
How to use mutate vectorizely from dplyr in R

Time:10-05

Context

I have a dataframe df including 3 variables a, b and c (Continuous variables).

I want to convert the continuous variables a, b and c to categorical variables a1, b1 and c1 using cutoff value which is stored in df_cutoff.

Now, I can do it with a low efficient way.

But in my real situation, I have much more variables like a1, b1 and c1 need to be mutated.

I cannot mutate it one by one.

Question

Is there any ways that can mutate a1, b1 and c1 more efficiently?

Reproducible code


df = data.frame(a = 1:5,
                b = 6:10,
                c = 11:15)
# > df
# a  b  c
# 1 1  6 11
# 2 2  7 12
# 3 3  8 13
# 4 4  9 14
# 5 5 10 15

df_cutoff = data.frame(var = c('a', 'b', 'c'),
                       cutoff = c(2.5, 7.5, 12.5))

# > df_cutoff
# var cutoff
# 1   a    2.5
# 2   b    7.5
# 3   c   12.5

a_cutoff = subset(df_cutoff, var == 'a')[,'cutoff', drop = TRUE]
b_cutoff = subset(df_cutoff, var == 'b')[,'cutoff', drop = TRUE]
c_cutoff = subset(df_cutoff, var == 'c')[,'cutoff', drop = TRUE]

df %>% mutate(a2 = ifelse(a <= a_cutoff, 'low', 'high'),
              b2 = ifelse(b <= b_cutoff, 'low', 'high'),
              c2 = ifelse(c <= c_cutoff, 'low', 'high'))
# a  b  c   a2   b2   c2
# 1 1  6 11  low  low  low
# 2 2  7 12  low  low  low
# 3 3  8 13 high high high
# 4 4  9 14 high high high
# 5 5 10 15 high high high

CodePudding user response:

With across, match the cur_column with df_cutoff$var to get the cutoff and use ifelse to assign the correct value.

library(dplyr)
df %>% 
  mutate(across(a:c, ~ ifelse(.x <= df_cutoff$cutoff[match(cur_column(), df_cutoff$var)],
                       'low', 'high'),
                .names = "{col}2"))

#   a  b  c   a2   b2   c2
# 1 1  6 11  low  low  low
# 2 2  7 12  low  low  low
# 3 3  8 13 high high high
# 4 4  9 14 high high high
# 5 5 10 15 high high high

CodePudding user response:

Another approach:

cbind(df, df <= rep(df_cutoff$cutoff, each = nrow(df)))
#   a  b  c     a     b     c
# 1 1  6 11  TRUE  TRUE  TRUE
# 2 2  7 12  TRUE  TRUE  TRUE
# 3 3  8 13 FALSE FALSE FALSE
# 4 4  9 14 FALSE FALSE FALSE
# 5 5 10 15 FALSE FALSE FALSE
  • Related