R newbie here. Struggling with passing a variable name to my own function. I use the subset command in my function because it is one of the steps, though I'd like to understand the broader logic of how to pass variable names, especially when using them as arguments within other functions (e.g. subset)
myf.subset <- function(data, xvar) {
new.data <- subset(data, xvar == 0)
return(new.data)
}
df <- data.frame(x = sample(c(0,1), size = 100, replace = TRUE))
myf.subset(df, xvar = x)
Which does not work, and returns
Error in eval(e, x, parent.frame()) : object 'x' not found
I then tried myf.subset(df, xvar = "x")
which returns an empty data frame.
Other attempts were
myf.subset <- function(data, xvar) {
new.data <- subset(data, eval(substitute(xvar)) == 0)
return(new.data)
}
df <- data.frame(x = sample(c(0,1), size = 100, replace = TRUE))
myf.subset(df, xvar = "x")
which again returns an empty data frame
[1] x
<0 rows> (or 0-length row.names)
EDITED: The next step would be to run a regression on the subset so defined, in which case I would add more variables (now embedding @akron answer)
myf.subset <- function(data, xvar, yvar, zvar) {
xvar <- deparse(substitute(xvar))
yvar <- deparse(substitute(yvar))
zvar <- deparse(substitute(zvar))
# new.data <- subset(data, xvar == 0)
new.data <- data[data[[xvar]] == 0, , drop = FALSE]
OLS <- lm(data = new.data, yvar~zvar )
return(OLS)
}
df <- data.frame(x = sample(c(0,1), size = 100, replace = TRUE),
y = sample(c(0,1), size = 100, replace = TRUE),
z = sample(c(0,1), size = 100, replace = TRUE))
myf.subset(df, xvar = x, yvar = y, zvar = z)
CodePudding user response:
Use deparse/substitute
to convert the unquoted argument to string and then use [[
to pull the column as a vector, create the logical vector and subset with [
myf.subset <- function(data, xvar) {
xvar <- deparse(substitute(xvar))
data[data[[xvar]] == 0, , drop = FALSE]
}
-testing
> myf.subset(df, xvar = x)
x
3 0
5 0
12 0
18 0
20 0
24 0
25 0
28 0
29 0
32 0
33 0
35 0
36 0
37 0
39 0
41 0
42 0
43 0
47 0
48 0
49 0
51 0
55 0
57 0
58 0
62 0
63 0
65 0
66 0
67 0
69 0
70 0
71 0
73 0
74 0
75 0
76 0
80 0
82 0
84 0
87 0
88 0
90 0
92 0
94 0
97 0
99 0
In the updated code, the formula can be created with reformulate
or paste
myf.subset <- function(data, xvar, yvar, zvar) {
xvar <- deparse(substitute(xvar))
yvar <- deparse(substitute(yvar))
zvar <- deparse(substitute(zvar))
# new.data <- subset(data, xvar == 0)
new.data <- data[data[[xvar]] == 0, , drop = FALSE]
fmla <- reformulate(zvar, response = yvar)
# fmla <- as.formula(paste(yvar, zvar, sep = ' ~ '))
OLS <- lm(data = new.data, fmla )
return(OLS)
}
-testing
> myf.subset(df, xvar = x, yvar = y, zvar = z)
Call:
lm(formula = fmla, data = new.data)
Coefficients:
(Intercept) z
0.48000 -0.01333