Is there any straightforward way to remove data that I no longer need from the environment, instead of using the rm() function? The example here employs the join, but it can also be the basic merge() Here is a simple example with only 2 datasets, but I actually have much more than that.
library(tidyverse)
library(lubridate)
x <- lubridate::lakers %>%
mutate(Month = lubridate::month(lubridate::as_date(date)))
y <- datasets::airquality
z <- y %>%
dplyr::inner_join(x, by = "Month")
rm(x,y)
CodePudding user response:
It is possible to do this, but it is really bad practice to automatically delete data from the calling frame as a side effect of a function.
Please don't ever use this IRL - it's only to demonstrate that it can be done, not that it should be done.
First we start with an empty workspace:
ls()
#> character(0)
Now we define two data frames we are going to join in our function:
df1 <- data.frame(x = 1:5, y = 1:5)
df2 <- data.frame(x = 6:10, y = 6:10)
ls()
#> [1] "df1" "df2"
The following function removes the input data frames from the calling environment and returns the two joined in a simple rbind
:
dangerous_function <- function(data1, data2) {
data3 <- rbind(data1, data2)
rm(list = c(deparse(substitute(data1)),
deparse(substitute(data2))), envir = parent.frame())
data3
}
df3 <- dangerous_function(df1, df2)
And we can see that df1
and df2
have indeed been deleted.
ls()
#> [1] "dangerous_function" "df3"
You could even write the function so that it deletes itself after it has been used, which would probably be for the best.
CodePudding user response:
You can achieve this by avoiding creating objects. What I mean access dataframes within dplyr expression:
z <- datasets::airquality %>%
dplyr::inner_join(lubridate::lakers %>%
mutate(Month = lubridate::month(lubridate::as_date(date))), by = "Month")
or simplified version without specifying the packages in the code:
z <- airquality %>% inner_join(lakers %>%
mutate(Month = month(as_date(date))), by = "Month")
Note: the simplified version assumes you have loaded the packages prior, while longer version assumes you have not loaded packages and thus have to speicify with "dplyr::" in front of the function or dataset.