I am importing a dataset from a third party and would would like to be able to validate that all of the columns in the incoming dataset are named as agreed to and expected. To do this, I intended to use the verify
statement in assertr
's package in R with has_all_names
. I can accomplish this with no problem if I manually enter the column names to be verified, but I can't seem to accomplish this by passing in a vector that contains the names of the columns to be verified. So for example, using the build-in iris dataset, I can verify that existence of the all the column names if I manually enter the names as an argument to the has_all_names
function, but if I have the names stored in a vector and attempt to use it for verification, it does not work:
#Create a sample list of column names to be verified
#In my real work, I obtain this list from a database
(names(iris)->expected_variable_names)
Which outputs:
[1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
But then I run the following and:
#This works:
iris %>% verify(has_all_names("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species"))
#But this does not:
iris %>% verify(has_all_names(expected_variable_names))
When I attempt to run the line that does not work, this generates:
verification [has_all_names(expected_variable_names)] failed! (1 failure)
verb redux_fn predicate column index value
1 verify NA has_all_names(expected_variable_names) NA 1 NA
Error: assertr stopped execution
Obviously, the failed attempt is indicating that not all of the column names are found in the dataframe, but since I'm passing in all the variable names that are indeed on the dataset, it should succeed. How can I pass into verify
a vector or possibly even a list of column names to validate? I've tried a number of different variations of this last attempt with no success.
Thanks.
CodePudding user response:
We may use invoke
library(purrr)
library(dplyr)
library(assertr)
iris %>%
verify(invoke(has_all_names, expected_variable_names))
-output
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5.0 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
...
Or with exec
from rlang
library(rlang)
iris %>%
verify(exec(has_all_names, !!!expected_variable_names))
Or with do.call
from base R
iris %>%
verify(do.call(has_all_names,
as.list(expected_variable_names)))