Home > Software design >  Passing vector of names to verify to assertr's verify in R
Passing vector of names to verify to assertr's verify in R

Time:05-28

I am importing a dataset from a third party and would would like to be able to validate that all of the columns in the incoming dataset are named as agreed to and expected. To do this, I intended to use the verify statement in assertr's package in R with has_all_names. I can accomplish this with no problem if I manually enter the column names to be verified, but I can't seem to accomplish this by passing in a vector that contains the names of the columns to be verified. So for example, using the build-in iris dataset, I can verify that existence of the all the column names if I manually enter the names as an argument to the has_all_names function, but if I have the names stored in a vector and attempt to use it for verification, it does not work:

#Create a sample list of column names to be verified
#In my real work, I obtain this list from a database
(names(iris)->expected_variable_names)

Which outputs:

[1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width" "Species"

But then I run the following and:

#This works:
iris %>% verify(has_all_names("Sepal.Length", "Sepal.Width",  "Petal.Length", "Petal.Width",  "Species"))

#But this does not:
iris %>% verify(has_all_names(expected_variable_names))

When I attempt to run the line that does not work, this generates:

verification [has_all_names(expected_variable_names)] failed! (1 failure)

    verb redux_fn                              predicate column index value
1 verify       NA has_all_names(expected_variable_names)     NA     1    NA

Error: assertr stopped execution

Obviously, the failed attempt is indicating that not all of the column names are found in the dataframe, but since I'm passing in all the variable names that are indeed on the dataset, it should succeed. How can I pass into verify a vector or possibly even a list of column names to validate? I've tried a number of different variations of this last attempt with no success.

Thanks.

CodePudding user response:

We may use invoke

library(purrr)
library(dplyr)
library(assertr)
iris %>% 
    verify(invoke(has_all_names, expected_variable_names))

-output

  Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1            5.1         3.5          1.4         0.2     setosa
2            4.9         3.0          1.4         0.2     setosa
3            4.7         3.2          1.3         0.2     setosa
4            4.6         3.1          1.5         0.2     setosa
5            5.0         3.6          1.4         0.2     setosa
6            5.4         3.9          1.7         0.4     setosa
7            4.6         3.4          1.4         0.3     setosa
8            5.0         3.4          1.5         0.2     setosa
9            4.4         2.9          1.4         0.2     setosa
10           4.9         3.1          1.5         0.1     setosa
...

Or with exec from rlang

library(rlang)
iris %>% 
    verify(exec(has_all_names, !!!expected_variable_names))

Or with do.call from base R

iris %>% 
   verify(do.call(has_all_names, 
          as.list(expected_variable_names)))
  • Related