Home > Blockchain >  Loop for variable definition R
Loop for variable definition R

Time:12-08

I have a data frame and I want to define multiple columns with the same function (ntile) operated on the original version (column) of the variable. I'm not sure whether a loop or something else will work but the below example is a toy example. My actual data frame has over 20 variables that this needs to be done on.

Basically I want to make a variable called "original_name"_bin for each of the numeric variables in my data frame. These _bin variables are just the ntile function operated on the original non _bin version:

dat1 <- read.table(text = "x1 x2 
10 20
20 30.5
30 40.5
40 20.12
50 25 
70 86  
80 75 
90 45 ", header = TRUE)

num_names <- paste(colnames(dat1[sapply(dat1, is.numeric)]))
bin_names <- paste(colnames(dat1[sapply(dat1, is.numeric)]), "bin", sep = "_")

# Want to make columns in data frame where the var_bin is:

dat1$x1_bin <- ntile(dat1$x1, n = 10)

# loop

for (i in 1:length(bin_names)){
  assign(paste0("dat1$", bin_names[i]), ntile(???, 10))
}

CodePudding user response:

Here is one base way to do it using lapply:

dat1 <- read.table(text = "x1 x2 
10 20
20 30.5
30 40.5
40 20.12
50 25 
70 86  
80 75 
90 45 ", header = TRUE)

num_names <- paste(colnames(dat1[sapply(dat1, is.numeric)]))
bin_names <- paste(colnames(dat1[sapply(dat1, is.numeric)]), "bin", sep = "_")

dat1[bin_names] <- lapply(dat1[num_names], \(x) dplyr::ntile(x, n = 10))

dat1
#>   x1    x2 x1_bin x2_bin
#> 1 10 20.00      1      1
#> 2 20 30.50      2      4
#> 3 30 40.50      3      5
#> 4 40 20.12      4      2
#> 5 50 25.00      5      3
#> 6 70 86.00      6      8
#> 7 80 75.00      7      7
#> 8 90 45.00      8      6

Created on 2021-12-07 by the reprex package (v2.0.1)


As base R loop:

for (i in 1:length(bin_names)){
  dat1[bin_names[i]] <- dplyr::ntile(dat1[num_names[i]], 10)
}

dat1
#>   x1    x2 x1_bin x2_bin
#> 1 10 20.00      1      1
#> 2 20 30.50      2      4
#> 3 30 40.50      3      5
#> 4 40 20.12      4      2
#> 5 50 25.00      5      3
#> 6 70 86.00      6      8
#> 7 80 75.00      7      7
#> 8 90 45.00      8      6

With dplyr::across:

library(dplyr)

dat1 %>% 
  mutate(across(all_of(num_names),
                ~ ntile(.x, n = 10),
                .names = "{.col}_bin"))

#>   x1    x2 x1_bin x2_bin
#> 1 10 20.00      1      1
#> 2 20 30.50      2      4
#> 3 30 40.50      3      5
#> 4 40 20.12      4      2
#> 5 50 25.00      5      3
#> 6 70 86.00      6      8
#> 7 80 75.00      7      7
#> 8 90 45.00      8      6

Created on 2021-12-07 by the reprex package (v2.0.1)

  • Related