I would like to use the cleanfunction
below on all data.frames in my environment.
cleanfunction <- function(dataframe) {
dataframe <- as.data.frame(dataframe)
## get mode of all vars
var_mode <- sapply(dataframe, mode)
## produce error if complex or raw is found
if (any(var_mode %in% c("complex", "raw"))) stop("complex or raw not allowed!")
## get class of all vars
var_class <- sapply(dataframe, class)
## produce error if an "AsIs" object has "logical" or "character" mode
if (any(var_mode[var_class == "AsIs"] %in% c("logical", "character"))) {
stop("matrix variables with 'AsIs' class must be 'numeric'")
}
## identify columns that needs be coerced to factors
ind1 <- which(var_mode %in% c("logical", "character"))
## coerce logical / character to factor with `as.factor`
dataframe[ind1] <- lapply(dataframe[ind1], as.factor)
return(dataframe)
}
set.seed(10238)
DT = data.table(
A = rep(1:3, each = 5L),
B = rep(1:5, 3L),
C = sample(15L),
D = sample(15L)
)
DT_II <- copy(DT)
dfs <- ls()
Now I want to apply this function to all df's in the enviroment. I have tried like ten things, but I cannot get the syntax rigtht..
for (i in seq_along(dfs)) {
get(dfs[i])[ , lapply(.SD, cleanfunction)]
}
EDIT:
I found this solution, but it does not store the result.
eapply(globalenv(), function(x) if (is.data.frame(x)) cleanfunction(x))
How do I store the result in each object?
CodePudding user response:
You get(dfs[i])
which returns a reference to a data.table
, but then you are lapply
-ing each column of that frame and I'm inferring from the function argument dataframe
that you expect a full frame. One might start with:
for (i in seq_along(dfs)) {
get(dfs[i])[ , cleanfunction(.SD)]
}
but realize that this operation returns a new frame, it does not use canonical data.table
mechanisms for updating data in-place. I suggest you update your function to always force data.table
and work on it referentially.
cleanfunction <- function(dataframe) {
setDT(dataframe)
## get mode of all vars
var_mode <- sapply(dataframe, mode)
## produce error if complex or raw is found
if (any(var_mode %in% c("complex", "raw"))) stop("complex or raw not allowed!")
## get class of all vars
var_class <- sapply(dataframe, class)
## produce error if an "AsIs" object has "logical" or "character" mode
if (any(var_mode[var_class == "AsIs"] %in% c("logical", "character"))) {
stop("matrix variables with 'AsIs' class must be 'numeric'")
}
## identify columns that needs be coerced to factors
ind1 <- which(var_mode %in% c("logical", "character"))
## coerce logical / character to factor with `as.factor`
if (length(ind1)) dataframe[, c(ind1) := lapply(.SD, as.factor), .SDcols = ind1]
return(dataframe)
}
Since your current data does not trigger any changes, I'll update one:
DT[,quux:="A"]
head(DT)
# A B C D quux
# <int> <int> <int> <int> <char>
# 1: 1 1 12 15 A
# 2: 1 2 4 6 A
# 3: 1 3 5 7 A
# 4: 1 4 9 1 A
# 5: 1 5 6 14 A
# 6: 2 1 15 13 A
for (i in seq_along(dfs)) cleanfunction(get(dfs[i]))
head(DT)
# A B C D quux
# <int> <int> <int> <int> <fctr>
# 1: 1 1 12 15 A
# 2: 1 2 4 6 A
# 3: 1 3 5 7 A
# 4: 1 4 9 1 A
# 5: 1 5 6 14 A
# 6: 2 1 15 13 A
Note that the for
loop is relying solely on referential updates; the return value from cleanfunction
is ignored here.
This method works entirely because of data.table
referential semantics; if you were using data.frame
or tbl_df
, this would likely require wrapping that call to cleanfunction(.)
with assign(dfs[i], cleanfunction(..))
.
CodePudding user response:
Does this work for you?:
# store all dataframes from environment a list
dfs <- Filter(function(x) is(x, "data.frame"), mget(ls()))
#then apply your function
lapply(dfs, cleanfunction)