Home > Software design >  Mutate a New Column in R from matching substring
Mutate a New Column in R from matching substring

Time:12-28

Trying to use mutate to create a new column based on a variable value :

for (var in custom_vars) {
  devices_ <-
    devices %>%
      mutate(var = grepl({{var}}, cluster, fixed = TRUE) %>% as_factor())
}

But it's not working. The column is created with the correct boolean values, but the name is set to "var". How do I set the new column to the value of var as intended.

CodePudding user response:

This is straightforward using grepl in sapply, since it gives already:

sapply(custom_vars, grepl, dat$cluster)
#          a     b
# [1,]  TRUE  TRUE
# [2,]  TRUE FALSE
# [3,] FALSE  TRUE
# [4,] FALSE FALSE

Then just cbind it to the initial data frame.

cbind(dat, sapply(custom_vars, grepl, dat$cluster))
#   cluster          x     a     b
# 1     abk 0.06279684  TRUE  TRUE
# 2     akl 0.36972330  TRUE FALSE
# 3     bkl 0.80486702 FALSE  TRUE
# 4     klm 0.32781444 FALSE FALSE

Data:

dat <- data.frame(cluster=c('abk', 'akl', 'bkl', 'klm'), x=runif(4))

CodePudding user response:

It is easier to help if you provide an example data with expected output which can be used to verify the answers.

I tried to create an example data based on my understanding.

library(dplyr)

df <- data.frame(a = 1:3, b = 3:1, cluster = c('abc', 'rds', 'dsb'))

#  a b cluster
#1 1 3     abc
#2 2 2     rds
#3 3 1     dsb

custom_vars <- c('a', 'b')

We can use across instead of for loop.

df <- df %>%  
      mutate(across(all_of(custom_vars), ~factor(grepl(cur_column(), cluster))))

str(df)
#'data.frame':  3 obs. of  3 variables:
# $ a      : Factor w/ 2 levels "FALSE","TRUE": 2 1 1
# $ b      : Factor w/ 2 levels "FALSE","TRUE": 2 1 2
# $ cluster: chr  "abc" "rds" "dsb"

cur_column() returns the name of the current column name which is used in grepl to look pattern and then logical values (TRUE/FALSE) is changed to factor.

  • Related