Home > Software design >  Remove unusful datasets after manipulation
Remove unusful datasets after manipulation

Time:03-19

Is there any straightforward way to remove data that I no longer need from the environment, instead of using the rm() function? The example here employs the join, but it can also be the basic merge() Here is a simple example with only 2 datasets, but I actually have much more than that.

library(tidyverse)
library(lubridate)

x <- lubridate::lakers %>%
  mutate(Month = lubridate::month(lubridate::as_date(date)))

y <- datasets::airquality 

z <- y %>% 
  dplyr::inner_join(x, by = "Month")

rm(x,y)

CodePudding user response:

It is possible to do this, but it is really bad practice to automatically delete data from the calling frame as a side effect of a function.

Please don't ever use this IRL - it's only to demonstrate that it can be done, not that it should be done.

First we start with an empty workspace:

ls()
#> character(0)

Now we define two data frames we are going to join in our function:

df1 <- data.frame(x = 1:5, y = 1:5)
df2 <- data.frame(x = 6:10, y = 6:10)
ls()
#> [1] "df1" "df2"

The following function removes the input data frames from the calling environment and returns the two joined in a simple rbind:

dangerous_function <- function(data1, data2) {
  data3 <- rbind(data1, data2)
  rm(list = c(deparse(substitute(data1)), 
       deparse(substitute(data2))), envir = parent.frame())
  data3
}

df3 <- dangerous_function(df1, df2)

And we can see that df1 and df2 have indeed been deleted.

ls()
#> [1] "dangerous_function" "df3"

You could even write the function so that it deletes itself after it has been used, which would probably be for the best.

CodePudding user response:

You can achieve this by avoiding creating objects. What I mean access dataframes within dplyr expression:

z <- datasets::airquality %>% 
  dplyr::inner_join(lubridate::lakers %>%
                    mutate(Month = lubridate::month(lubridate::as_date(date))), by = "Month")

or simplified version without specifying the packages in the code:

z <- airquality %>% inner_join(lakers %>%
                      mutate(Month = month(as_date(date))), by = "Month")

Note: the simplified version assumes you have loaded the packages prior, while longer version assumes you have not loaded packages and thus have to speicify with "dplyr::" in front of the function or dataset.

  • Related