I'm struggling to create a function in R that uses data.frame's variable's names as part of its arguments.
Say for example that I have this data
test.df <-
data.frame(
variable_1 = sample(letters[1:4],10, replace = T),
variable_2 = rnorm(10,10,3),
variable_3 = rnorm(10,40,15))
test.df
variable_1 variable_2 variable_3
1 c 5.514034 59.23525
2 a 10.515690 31.94552
3 d 11.845118 47.39481
4 c 8.481335 22.32198
5 d 7.945798 29.02631
6 c 9.631182 41.90519
7 c 9.348816 53.79478
8 a 4.559642 58.47290
9 d 9.876674 53.53151
10 c 12.955443 49.84759
And I need to create a function which accesses any given variable by its name and, for example, extracts and reports it's mean in the form 'The mean is: X
' (where 'X
' contains the mean value). So far I've tried this:
my.function <- function(df, variable) {
paste0("The mean is: ",
round(mean(df$variable),2))
}
But when evaluating my.function
in 'my test.df' it shows that is clearly doing the job:
> my.function(test.df, variable_2)
[1] "The mean of the varibale is: NA"
So my questions are:
Hoy do I call variables names inside a funtion's argument? I know there is various ways to do this since outhere thare ere other libraries that for example uses either
variable_2
or"variable_2"
, or when needing more than one variable, either list variables without quotations just separating them by commas (variable_2, variable_3
as indplyr::select()
), or one has to place target variables as character groups (c("variable_2", "variable_3")
as inreshape2::melt()
)BONUS: I really like when using functions that require more than one variable, you can press tab, and the list of available variables shows up (as in
dplyr::select()
for example). How do I get this feature when building my own functions?
Thanks in advance! :)
CodePudding user response:
If we are passing unquoted argument for column names, then convert to string with deparse/substitute
and use [[
instead of $
. Also, create a condition to check if the value from substitute
is symbol
, then use deparse
so that it can pass both quoted and unquoted
my.function <- function(df, variable) {
variable <- substitute(variable)
if(is.symbol(variable)) variable <- deparse(variable)
paste0("The mean is: ",
round(mean(df[[variable]], na.rm = TRUE),2))
}
-testing
> my.function(test.df, variable_2)
[1] "The mean is: 9.86"
> my.function(test.df, "variable_2")
[1] "The mean is: 9.86"
If we want to get the mean of multiple columns, use colMeans
and pass the variable as a character vector
my.function <- function(df, variable) {
v1 <- colMeans(df[variable], na.rm = TRUE)
sprintf("The mean of %s: %f", names(v1), v1)
}
-testing
> my.function(test.df, c("variable_2", "variable_3"))
[1] "The mean of variable_2: 9.860057" "The mean of variable_3: 42.317997"
CodePudding user response:
Instead of df$nameOfColumn
, you can use:
column <- "nameOfColumn"
df[[column]]
Example:
my.function <- function(df, variable) {
paste0("The mean is: ",
round(mean(df[[variable]]),2))
}
> my.function(test.df, "variable_2")
[1] "The mean is: 11.88"
This can be found in the R Language Definition under Indexing