Home > Mobile >  How to call or acces a dataframe variables inside a funtion in R
How to call or acces a dataframe variables inside a funtion in R

Time:07-06

I'm struggling to create a function in R that uses data.frame's variable's names as part of its arguments.

Say for example that I have this data

test.df <- 
  data.frame(
    variable_1 = sample(letters[1:4],10, replace = T),
    variable_2 = rnorm(10,10,3),
    variable_3 = rnorm(10,40,15))
    
test.df
    
   variable_1 variable_2 variable_3
1           c   5.514034   59.23525
2           a  10.515690   31.94552
3           d  11.845118   47.39481
4           c   8.481335   22.32198
5           d   7.945798   29.02631
6           c   9.631182   41.90519
7           c   9.348816   53.79478
8           a   4.559642   58.47290
9           d   9.876674   53.53151
10          c  12.955443   49.84759

And I need to create a function which accesses any given variable by its name and, for example, extracts and reports it's mean in the form 'The mean is: X' (where 'X' contains the mean value). So far I've tried this:

my.function <- function(df, variable) {
  paste0("The mean is: ",
         round(mean(df$variable),2))
}

But when evaluating my.function in 'my test.df' it shows that is clearly doing the job:

> my.function(test.df, variable_2)
[1] "The mean of the varibale is: NA"

So my questions are:

  • Hoy do I call variables names inside a funtion's argument? I know there is various ways to do this since outhere thare ere other libraries that for example uses either variable_2 or "variable_2", or when needing more than one variable, either list variables without quotations just separating them by commas (variable_2, variable_3 as in dplyr::select()), or one has to place target variables as character groups (c("variable_2", "variable_3") as in reshape2::melt())

  • BONUS: I really like when using functions that require more than one variable, you can press tab, and the list of available variables shows up (as in dplyr::select() for example). How do I get this feature when building my own functions?

Thanks in advance! :)

CodePudding user response:

If we are passing unquoted argument for column names, then convert to string with deparse/substitute and use [[ instead of $. Also, create a condition to check if the value from substitute is symbol, then use deparse so that it can pass both quoted and unquoted

my.function <- function(df, variable) {
 variable <- substitute(variable)
  if(is.symbol(variable)) variable <- deparse(variable)
   paste0("The mean is: ",
          round(mean(df[[variable]], na.rm = TRUE),2))
}

-testing

> my.function(test.df, variable_2)
[1] "The mean is: 9.86"
> my.function(test.df, "variable_2")
[1] "The mean is: 9.86"

If we want to get the mean of multiple columns, use colMeans and pass the variable as a character vector

my.function <- function(df, variable) {
    v1 <- colMeans(df[variable], na.rm = TRUE)
    sprintf("The mean of %s: %f", names(v1), v1)
    }

-testing

> my.function(test.df, c("variable_2", "variable_3"))
[1] "The mean of variable_2: 9.860057"  "The mean of variable_3: 42.317997"

CodePudding user response:

Instead of df$nameOfColumn, you can use:

column <- "nameOfColumn"
df[[column]]

Example:

my.function <- function(df, variable) {
  paste0("The mean is: ",
         round(mean(df[[variable]]),2))
}
> my.function(test.df, "variable_2")
[1] "The mean is: 11.88"

This can be found in the R Language Definition under Indexing

  • Related