Recoding of multiple variables without using loop-CodePudding

str1<-c("A","B","C","D","E","F")
str2<-c("Apple", "Mango", "Avocado", "Watermelon", "Banana", "Pineapple")
str3<-c("Mouse","Cat", "Lion", "Shark", "Eagle", "Ladybug")
num1<-c(1:6)
num2<-c(2.3, 3.5, 4, 7, 6.2, 3)
binary1<-c(0,1,0,1,0,0)
binary2<-c(1,1,0,0,0,1)

mydata<-data.frame(str1,str2, str3,num1,num2, binary1, binary2)

It is always said that a vectorization is a better way than a looping.

So I am wondering how to recode a lot of variables by vectorization instead of using loops:

My first task is to change str1, str2 and str3 in factor, and I used:

for (i in c("str1","str2","str3"){
mydata[i]<-as.factor (mydata[i])
}

My second task is to change variables binary1 and binary2 in factor and change their values in 0=No, 1= Yes. I used:

for (i in c("binary1","binary2"){
mydata[i]<-factor (mydata[i], levels=c(0,1), labels=c("No","Yes"))
}

How to use vectorization instead of loop in each case.

CodePudding user response：

For example, by using dplyr:

library(dplyr)   
mydata %>%
  mutate(across(c(1:3,6:7), ~as.factor(.)),
         across(starts_with("bin"), ~ifelse(. == 1, "Yes", "No")))
  str1       str2    str3 num1 num2 binary1 binary2
1    A      Apple   Mouse    1  2.3      No     Yes
2    B      Mango     Cat    2  3.5     Yes     Yes
3    C    Avocado    Lion    3  4.0      No      No
4    D Watermelon   Shark    4  7.0     Yes      No
5    E     Banana   Eagle    5  6.2      No      No
6    F  Pineapple Ladybug    6  3.0      No     Yes

CodePudding user response：

You can use the map() function from purrr.

# Change str1, str2 and str3 into factors using the map() function
mydata[, c("str1", "str2", "str3")] <- 
  purrr::map(mydata[, c("str1", "str2", "str3")], 
             .f = as.factor)

str(mydata)

# Change variables binary1 and binary2 in factor and change their values in 0 = No, 1 = Yes using the map() function
mydata[, c("binary1", "binary2")] <- 
  purrr::map(mydata[, c("binary1", "binary2")], 
             .f = factor, levels = c(0, 1), labels = c("No", "Yes"))

str(mydata)

'data.frame':   6 obs. of  7 variables:
 $ str1   : Factor w/ 6 levels "A","B","C","D",..: 1 2 3 4 5 6
 $ str2   : Factor w/ 6 levels "Apple","Avocado",..: 1 4 2 6 3 5
 $ str3   : Factor w/ 6 levels "Cat","Eagle",..: 5 1 4 6 2 3
 $ num1   : int  1 2 3 4 5 6
 $ num2   : num  2.3 3.5 4 7 6.2 3
 $ binary1: num  0 1 0 1 0 0
 $ binary2: num  1 1 0 0 0 1

CodePudding user response：

Please, find below one alternative solution using data.table

Code

library(data.table)

sel_cols1 <- c("str1", "str2", "str3") 
sel_cols2 <- c("binary1", "binary2")

setDT(mydata)[, (sel_cols1) := lapply(.SD, as.factor), .SDcols = sel_cols1
              ][, (sel_cols2) := lapply(.SD, function(x) as.factor(fifelse(x == 0, "No", "Yes"))), .SDcols = sel_cols2][]

Output

#>    str1       str2    str3 num1 num2 binary1 binary2
#> 1:    A      Apple   Mouse    1  2.3      No     Yes
#> 2:    B      Mango     Cat    2  3.5     Yes     Yes
#> 3:    C    Avocado    Lion    3  4.0      No      No
#> 4:    D Watermelon   Shark    4  7.0     Yes      No
#> 5:    E     Banana   Eagle    5  6.2      No      No
#> 6:    F  Pineapple Ladybug    6  3.0      No     Yes

Check of class variables

sapply(mydata,class)
#>      str1      str2      str3      num1      num2   binary1   binary2 
#>  "factor"  "factor"  "factor" "integer" "numeric"  "factor"  "factor"

^{Created on 2021-11-16 by the reprex package (v2.0.1)}