I made this helper function to compare variable number of input tables using calls to janitor::compare_df_cols
.
Sometimes I have a namedlist of dataframes, and sometimes I may write their names directly when they are 2 or 3.
I want the helper function to get indistinctly either ...
or a list()
, in all cases containing dataframes.
Ideally, I want ...
to convert into a named-list with names preserving those of the variables in the calling side passed as arguments. (Ex. people, people2, people3)
...
gets unnamed and works different if conversion to list()
is defined inside or outside the function.
EXAMPLE DATASET
lastnames <- LETTERS[1:4] ; names <- c("uno", "dos", "tres", "cuatro");
age <- c(1:4) ; height <- seq(190,200,3)
people = data.frame(names, lastnames, age, height)
people2 = people %>% mutate( age = age 20)
people3 = people; people3$height[[3]] = 160
HELPER FUNCTION
helper_df_compare = function( ..., a_default_arg="def" ){
##### Compare mismatching columns types
##### NOTE that, it does not checks contents
rbind(
janitor::compare_df_cols( ..., return="mismatch" ) %>%
mutate( column_name = paste("!!!", column_name) ),
janitor::compare_df_cols( ..., return="match" )
) %>%
mutate_all( ~str_replace_all(.,c(
"integer"="int", "numeric"="num", "character"="chr", "factor"="fct",
"POSIXct, POSIXt"="POSIXct"
) ) )
}
INTENDED OPTIONAL CALLING METHODS
helper_df_compare( database_rnamedlist ) # <- preferred
helper_df_compare( list(people, people2, people3) )
helper_df_compare( people, people2, people3 ) # <- preferred
helper_df_compare( list("A"=people, "B"=people2, "C"=people3) )
helper_df_compare( A=people, B=people2, C=people3 ) # <- preferred
CURRENT OUTPUTs: NOTE: column names should be the table name passed as argument
column_name ..1_1 ..1_2 ..1_3
1 !!! age int num int
2 height num num num
3 lastnames chr chr chr
4 names chr chr chr
column_name A B C
1 !!! age int num int
2 height num num num
3 lastnames chr chr chr
4 names chr chr chr
EXPECTED OUTPUT:
column_name people people2 people3
1 !!! age int num int
2 height num num num
3 lastnames chr chr chr
4 names chr chr chr
column_name A B C
1 !!! age int num int
2 height num num num
3 lastnames chr chr chr
4 names chr chr chr
CodePudding user response:
You can handle the direct input of data frames, both named and unnamed, like this:
helper_df_compare = function( ..., a_default_arg = "def" ){
dots <- rlang::list2(...)
args <- as.list(match.call())[-1]
if(is.null(names(dots))) names(dots) <- rep('', length(dots))
for(i in seq_along(dots)) {
if(!nzchar(names(dots)[i])) names(dots)[i] <- as.character(args[[i]])
}
rbind(
do.call(janitor::compare_df_cols, c(dots, return = "mismatch")) %>%
mutate( column_name = paste("!!!", column_name) ),
do.call(janitor::compare_df_cols, c(dots, return = "match"))
) %>%
mutate_all( ~str_replace_all(.,c(
"integer"="int", "numeric"="num", "character"="chr", "factor"="fct",
"POSIXct, POSIXt"="POSIXct"
) ) )
}
This allows
helper_df_compare(people, people2, people3)
#> column_name people people2 people3
#> 1 !!! age int num int
#> 2 height num num num
#> 3 lastnames chr chr chr
#> 4 names chr chr chr
and this:
helper_df_compare(A = people, B = people2, C = people3)
#> column_name A B C
#> 1 !!! age int num int
#> 2 height num num num
#> 3 lastnames chr chr chr
#> 4 names chr chr chr