I would like to create a function, where the argument (input) would be unknown amount of data frames (could vary) and output is the data frame with data type for each column of data frames from the input.
Example: I have 2 data frames below (amount of data frames can vary, so I am not sure how to pass it as a function argument).
# Dataframe 1
kpi_id <- c("SL", "OOS")
kpi_val <- c (1,2)
df1 <- data.frame(kpi_id, kpi_val)
> sapply(df1, class)
kpi_id kpi_val
"character" "numeric"
# Dataframe 2
kpi_id <- c("SL", "OOS")
kpi_val <- c ("3", "4")
df2 <- data.frame(kpi_id, kpi_val)
> sapply(df2, class)
kpi_id kpi_val
"character" "character"
I can get a result in a simple manner as below:
df_types1 <- as.data.frame(sapply(df1, class))
colnames(df_types)[1] <- deparse(substitute(df1))
df_types2 <- as.data.frame(sapply(df2, class))
colnames(df_types)[1] <- deparse(substitute(df2))
df_types3 <- bind_cols(df_types1, df_types2)
> df_types3
df1 df2
kpi_id character character
kpi_val numeric character
How can I create a function where initial amount of data frames is unknown to get the same output?
CodePudding user response:
Here is a function you can use; pass a list of data frames, whether that list is named, or unnamed:
df_types <- function(dfs) {
do.call(
rbind,
lapply(seq_along(dfs), function(d) {
data.frame(
df = ifelse(is.null(names(dfs)), rep(d,ncol(dfs[[d]])), names(dfs)[d]),
col = names(dfs[[d]]),
type=sapply(dfs[[d]],typeof),row.names = NULL)
})
)
}
Usage
df_types(list("a" = df1,"b" = df2))
Output:
df col type
1 a kpi_id character
2 a kpi_val double
3 b kpi_id character
4 b kpi_val character
CodePudding user response:
Using rapply
.
rapply(list(df1=df1, df2=df2), class, how='l') |>
do.call(what='cbind')
# df1 df2
# kpi_id "character" "character"
# kpi_val "numeric" "character"
If you get weird output due to multiple classes,
df1$date <- df2$date <- as.POSIXct(Sys.Date())
rapply(list(df1=df1, df2=df2), class, how='l') |>
do.call(what='cbind')
# df1 df2
# kpi_id "character" "character"
# kpi_val "numeric" "character"
# date character,2 character,2
you could use data.class
which returns just the first one:
rapply(list(df1=df1, df2=df2), data.class, how='l') |>
do.call(what='cbind')
# df1 df2
# kpi_id "character" "character"
# kpi_val "numeric" "character"
# date "POSIXct" "POSIXct"