A function included in a package is throwing an error when I attempt to supply weights to the function. The portion of the package call is required to be specified like this:
weights = c("kernel_wght")
Inside the function, the following two lines of code are used to specify a data frame object called weight:
weight1 <- sprintf("dataarg$%s", weights)
weight <- as.data.frame(eval(parse(text = weight1)))
However, the analytic portion of the function attempts to use glm to conduct an analysis of data, using the weights provided.
result1 <- glm(f1, family="gaussian", weights=weight, data=dataarg)
Doing so yields the following error:
Error in (function (arg) : object 'weight' not found
I've seen some recommendations that the whole glm call should be re-specified...and i've seen some referrals to global environment objects. Why can i print the dataframe, verifying it indeed is created, but can't refer to it in the call to glm? Is there a fix that i have overlooked?
As per requested, here is a small example. I created some sample data, as if it had come from a multiple imputation generating process:
dat <- c(1, 1, 0, .5, 1, 3, 0, 1, 1, 4, 0, .5, 1, 5, 1, 1, 1, 2, 1,
.5,
2, 7, 1, 1, 2, 3, 0, .5, 2, 2, 0, 1, 2, 4, 1, .5)
dat <- data.frame(matrix(dat,ncol=4, byrow=T))
colnames(dat) <- c("id", "y", "tx", "wt")
imp_lst <- lapply(1:2, function(s) dplyr::filter(dat, id == s))
for (i in 1:length(imp_lst)) { assign(paste0("imp", i),
as.data.frame(imp_lst[[i]])) }
df_lst <- list()
for (i in 1:length(imp_lst)) {
assign(paste0("imp", i), as.data.frame(imp_lst[[i]]))
df_lst <- append(df_lst, list(get(paste0("imp", i))))
names(df_lst)[i] <- paste0("imp", i)
}
And here is a small example, mostly taken from the package, that re-creates the problem:
my_ex <- function(datasets, y, treatment, weights=NULL, ...) {
data <- names(datasets)
for (i in 1:length(treatment)) {
d1 <- sprintf("datasets$%s", data[i])
dataarg <- eval(parse(text=d1))
print(dataarg)
if(!is.null(weights)) {
weight1 <- sprintf("dataarg$%s", weights)
weight <- as.data.frame(eval(parse(text = weight1)))
print(weight)
} else {
dataarg$weight <- weight <- rep(1,nrow(dataarg))
}
f1 <- sprintf("%s ~ %s ", y, treatment)
print(f1)
result1 <- glm(f1, family="gaussian", weights=weight, data=dataarg)
print(summary(result1))
}
}
Using the following call, the error appears:
testrun <- my_ex(df_lst, y = c("y","y"), treatment = c("tx","tx"), weights = c("wt","wt"))enter code here
CodePudding user response:
The proximal problem is that you are defining the formula as a character string and passing it to glm
. It gets converted to a formula within glm
, but when that happens its environment is the environment of glm
, so it doesn't know where to look for the weights
variable (loosely speaking, glm
will look (1) within the data frame provided as data
and (2) in the environment of the formula). You can work around this by using as.formula()
to convert the string to a formula before passing it to glm
(e.g. glm(as.formula(f1), ...)
).
However: using functions like eval
, parse
, assign
is a code smell in R — it means there's probably a more natural, simpler, more robust way to do what you want. For example, I think this function does the same as what your function is trying to do, relying on indexing within lists rather than using eval(parse(...))
and friends.
my_ex2 <- function(datasets, y, treatment, weights = NULL, ...) {
result <- list()
for (i in 1:length(treatment)) {
form <- reformulate(treatment[i], response = y[i])
data <- datasets[[i]]
## note double brackets around second term - we want
## the results to be a vector, not a data frame
weight <- data[[weights[i]]]
result[[i]] <- glm(formula = form, weight = weight, data = data)
}
result
}
Then, to print out all the summaries, lapply(result, summary)
(if you really think you only need the summary, you can save the summary instead of the fitted object inside the loop).