my first question on Stack Overflow so bear with me ;-)
I wrote a function to row-bind all objects whose names meet a regex criterion into a dataframe.
Curiously, if I run the lines out of the function, it works perfectly. But within the function, an empty data frame is returned.
Reproducible example:
offers_2022_05 <- data.frame(x = 3)
offers_2022_06 <- data.frame(x = 6)
bind_multiple_dates <- function(prefix) {
objects <- ls(pattern = sprintf("%s_[0-9]{4}_[0-9]{2}", prefix))
data <- bind_rows(mget(objects, envir = .GlobalEnv), .id = "month")
return(data)
}
bind_multiple_dates("offers")
# A tibble: 0 × 0
However, this works:
prefix <- "offers"
objects <- ls(pattern = sprintf("%s_[0-9]{4}_[0-9]{2}", prefix))
data <- bind_rows(mget(objects, envir = .GlobalEnv), .id = "month")
data
month x
1 offers_2022_05 3
2 offers_2022_06 5
I suppose it has something to do with the environment, but I can't really figure it out. Is there a better way to do this? I would like to keep the code as a function.
Thanks in advance :-)
CodePudding user response:
By default ls()
will look in the current environment when looking for variables. In this case, the current environment is the function body and those data.frame variables are not inside the function scope. You can explicitly set the environment to the calling environment to find using the envir=
parameter. For example
bind_multiple_dates <- function(prefix) {
objects <- ls(pattern = sprintf("%s_[0-9]{4}_[0-9]{2}", prefix), envir=parent.frame())
data <- bind_rows(mget(objects, envir = .GlobalEnv), .id = "month")
return(data)
}
The "better" way to do this is to not create a bunch of separate variables like offers_2022_05
and offers_2022_06
in the first place. Variables should not have data or indexes in their name. It would be better to create the data frames in a list directly from the beginning. Often this is easily accomplished with a call to lapply
or purrr::map
. See this existing question for more info