Home > Blockchain >  Dplyr variable names in function R
Dplyr variable names in function R

Time:12-09

I'm trying to create a function using some dplyr functions and I think I'm running into issues with NSE. The below functions works when I use the actual name of the variables in the argument but when I try to call to the elements of the vectors that I made, it doesn't.

I think I need to something about the quoting/unquoting of the arguments but I'm kind of stumped:

Works:

 dat1 <- read.table(text = "x1 x2 y
10 20 50
20 30.5 100
30 40.5 200
40 20.12 400
50 25 500
70 86 600
80 75 700
90 45 800", header = TRUE)
 
 num_names <- paste(colnames(dat1[sapply(dat1, is.numeric)]))
 bin_names <- paste(colnames(dat1[sapply(dat1, is.numeric)]), "bin", sep = "_")
 dat1[bin_names] <- lapply(dat1[num_names], function(x) dplyr::ntile(x, n = 10))
 
 
 make_iv <- function(df, variable, bin_variable){
   
   
   df <- df
   ivv <- df %>%
     group_by({{bin_variable}}) %>%
     summarise(N_ = n(),
               min_x = min({{variable}}),
               max_x = max({{variable}}),
               SumY = sum(y),
               perc_obs = (n()/nrow(df)),
               ans = sum(perc_obs))
   
  
   return(ivv)
 }
 
 
 make_iv(df = dat1,
         variable = x1,
         bin_variable = x1_bin)

Does not work:

 dat1 <- read.table(text = "x1 x2 y
10 20 50
20 30.5 100
30 40.5 200
40 20.12 400
50 25 500
70 86 600
80 75 700
90 45 800", header = TRUE)
 
 num_names <- paste(colnames(dat1[sapply(dat1, is.numeric)]))
 bin_names <- paste(colnames(dat1[sapply(dat1, is.numeric)]), "bin", sep = "_")
 dat1[bin_names] <- lapply(dat1[num_names], function(x) dplyr::ntile(x, n = 10))
 
 
 make_iv <- function(df, variable, bin_variable){
   
   
   df <- df
   ivv <- df %>%
     group_by({{bin_variable}}) %>%
     summarise(N_ = n(),
               min_x = min({{variable}}),
               max_x = max({{variable}}),
               SumY = sum(y),
               perc_obs = (n()/nrow(df)),
               ans = sum(perc_obs))
   
  
   return(ivv)
 }
 
 
 make_iv(df = dat1,
         variable = num_names[1],
         bin_variable = bin_names[1])

CodePudding user response:

You need to distinguish if you have variable name as symbol (not sure if this is good term) or as string. NSE refers to symbols, i.e. you do not write quotes. In your first example you use symbols, in second - strings. And for string another syntax is necessary. Instead of {{variable}} you need to use .data[[variable]]:

library(dplyr)

dat1 <- read.table(text = "x1 x2 y
10 20 50
20 30.5 100
30 40.5 200
40 20.12 400
50 25 500
70 86 600
80 75 700
90 45 800", header = TRUE)

num_names <- paste(colnames(dat1[sapply(dat1, is.numeric)]))
bin_names <- paste(colnames(dat1[sapply(dat1, is.numeric)]), "bin", sep = "_")
dat1[bin_names] <- lapply(dat1[num_names], function(x) dplyr::ntile(x, n = 10))


make_iv <- function(df, variable, bin_variable){
  
  
  df <- df
  ivv <- df %>%
    group_by(.data[[bin_variable]]) %>%
    summarise(N_ = n(),
              min_x = min(.data[[variable]]),
              max_x = max(.data[[variable]]),
              SumY = sum(y),
              perc_obs = (n()/nrow(df)),
              ans = sum(perc_obs))
  
  
  return(ivv)
}


make_iv(df = dat1,
        variable = num_names[1],
        bin_variable = bin_names[1])

If you haven't see it, here is a source: Programming with dplyr

  • Related