Home > OS >  Function for checking data type for several data frames R
Function for checking data type for several data frames R

Time:05-10

I would like to create a function, where the argument (input) would be unknown amount of data frames (could vary) and output is the data frame with data type for each column of data frames from the input.

Example: I have 2 data frames below (amount of data frames can vary, so I am not sure how to pass it as a function argument).


# Dataframe 1
kpi_id <- c("SL",  "OOS")
kpi_val <- c (1,2)

df1 <-  data.frame(kpi_id,   kpi_val)

> sapply(df1, class)

   kpi_id     kpi_val 
"character"   "numeric"

# Dataframe 2
kpi_id <- c("SL",  "OOS")
kpi_val <- c ("3", "4")

df2 <-  data.frame(kpi_id,   kpi_val)

> sapply(df2, class)
  kpi_id     kpi_val 
"character" "character"

I can get a result in a simple manner as below:

df_types1 <- as.data.frame(sapply(df1, class)) 
colnames(df_types)[1] <- deparse(substitute(df1))


df_types2 <- as.data.frame(sapply(df2, class)) 
colnames(df_types)[1] <- deparse(substitute(df2))


df_types3 <- bind_cols(df_types1, df_types2)

> df_types3
              df1       df2
kpi_id  character   character
kpi_val   numeric   character

How can I create a function where initial amount of data frames is unknown to get the same output?

CodePudding user response:

Here is a function you can use; pass a list of data frames, whether that list is named, or unnamed:

df_types <- function(dfs) {
  do.call(
    rbind, 
    lapply(seq_along(dfs), function(d) {
        data.frame(
          df = ifelse(is.null(names(dfs)), rep(d,ncol(dfs[[d]])), names(dfs)[d]),
          col = names(dfs[[d]]),
          type=sapply(dfs[[d]],typeof),row.names = NULL)
      })
  )
}

Usage

df_types(list("a" = df1,"b" = df2))

Output:

  df     col      type
1  a  kpi_id character
2  a kpi_val    double
3  b  kpi_id character
4  b kpi_val character

CodePudding user response:

Using rapply.

rapply(list(df1=df1, df2=df2), class, how='l') |>
  do.call(what='cbind')
#                 df1         df2        
# kpi_id  "character" "character"
# kpi_val "numeric"   "character"

If you get weird output due to multiple classes,

df1$date <- df2$date <- as.POSIXct(Sys.Date())

rapply(list(df1=df1, df2=df2), class, how='l') |>
  do.call(what='cbind')
#                df1         df2        
# kpi_id  "character" "character"
# kpi_val "numeric"   "character"
# date    character,2 character,2

you could use data.class which returns just the first one:

rapply(list(df1=df1, df2=df2), data.class, how='l') |>
  do.call(what='cbind')
#                df1         df2        
# kpi_id  "character" "character"
# kpi_val "numeric"   "character"
# date    "POSIXct"   "POSIXct"
  • Related