Home > database >  Applying a function to all data.frames in the environment
Applying a function to all data.frames in the environment

Time:04-19

I would like to use the cleanfunction below on all data.frames in my environment.

cleanfunction <- function(dataframe) {
  dataframe <- as.data.frame(dataframe)
  ## get mode of all vars
  var_mode <- sapply(dataframe, mode)
  ## produce error if complex or raw is found
  if (any(var_mode %in% c("complex", "raw"))) stop("complex or raw not allowed!")
  ## get class of all vars
  var_class <- sapply(dataframe, class)
  ## produce error if an "AsIs" object has "logical" or "character" mode
  if (any(var_mode[var_class == "AsIs"] %in% c("logical", "character"))) {
      stop("matrix variables with 'AsIs' class must be 'numeric'")
      }
  ## identify columns that needs be coerced to factors
  ind1 <- which(var_mode %in% c("logical", "character"))
  ## coerce logical / character to factor with `as.factor`
  dataframe[ind1] <- lapply(dataframe[ind1], as.factor)
  return(dataframe)
}

set.seed(10238)
DT = data.table(
  A = rep(1:3, each = 5L), 
  B = rep(1:5, 3L),
  C = sample(15L),
  D = sample(15L)
)
DT_II <- copy(DT)
dfs <- ls()

Now I want to apply this function to all df's in the enviroment. I have tried like ten things, but I cannot get the syntax rigtht..

for (i in seq_along(dfs)) {
  get(dfs[i])[ , lapply(.SD, cleanfunction)]
}

EDIT:

I found this solution, but it does not store the result.

eapply(globalenv(), function(x) if (is.data.frame(x)) cleanfunction(x))

How do I store the result in each object?

CodePudding user response:

You get(dfs[i]) which returns a reference to a data.table, but then you are lapply-ing each column of that frame and I'm inferring from the function argument dataframe that you expect a full frame. One might start with:

for (i in seq_along(dfs)) {
  get(dfs[i])[ , cleanfunction(.SD)]
}

but realize that this operation returns a new frame, it does not use canonical data.table mechanisms for updating data in-place. I suggest you update your function to always force data.table and work on it referentially.

cleanfunction <- function(dataframe) {
  setDT(dataframe)
  ## get mode of all vars
  var_mode <- sapply(dataframe, mode)
  ## produce error if complex or raw is found
  if (any(var_mode %in% c("complex", "raw"))) stop("complex or raw not allowed!")
  ## get class of all vars
  var_class <- sapply(dataframe, class)
  ## produce error if an "AsIs" object has "logical" or "character" mode
  if (any(var_mode[var_class == "AsIs"] %in% c("logical", "character"))) {
      stop("matrix variables with 'AsIs' class must be 'numeric'")
      }
  ## identify columns that needs be coerced to factors
  ind1 <- which(var_mode %in% c("logical", "character"))
  ## coerce logical / character to factor with `as.factor`
  if (length(ind1)) dataframe[, c(ind1) := lapply(.SD, as.factor), .SDcols = ind1]
  return(dataframe)
}

Since your current data does not trigger any changes, I'll update one:

DT[,quux:="A"]
head(DT)
#        A     B     C     D   quux
#    <int> <int> <int> <int> <char>
# 1:     1     1    12    15      A
# 2:     1     2     4     6      A
# 3:     1     3     5     7      A
# 4:     1     4     9     1      A
# 5:     1     5     6    14      A
# 6:     2     1    15    13      A

for (i in seq_along(dfs)) cleanfunction(get(dfs[i]))
head(DT)
#        A     B     C     D   quux
#    <int> <int> <int> <int> <fctr>
# 1:     1     1    12    15      A
# 2:     1     2     4     6      A
# 3:     1     3     5     7      A
# 4:     1     4     9     1      A
# 5:     1     5     6    14      A
# 6:     2     1    15    13      A

Note that the for loop is relying solely on referential updates; the return value from cleanfunction is ignored here.

This method works entirely because of data.table referential semantics; if you were using data.frame or tbl_df, this would likely require wrapping that call to cleanfunction(.) with assign(dfs[i], cleanfunction(..)).

CodePudding user response:

Does this work for you?:

# store all dataframes from environment a list
dfs <- Filter(function(x) is(x, "data.frame"), mget(ls()))

#then apply your function
lapply(dfs, cleanfunction)
  • Related