I am just getting acquainted with the validate
package. Unfortunately, at the very beginning I ran into a problem and I can't find the right solution. I would like to create one validation rule that I can later apply to multiple variables.
I will show it on an example.
I have such a tibble
:
library(tidyverse)
library(validate)
df = tibble(
id = rep(1:10, each=20),
name = rep(paste0("v", 1:20), 10),
value = rnorm(length(name))
) %>% pivot_wider()
otuput
# A tibble: 10 x 21
id v1 v2 v3 v4 v5 v6 v7 v8 v9 v10
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1.20 0.182 -1.53 2.73 -1.60 -0.976 -0.767 -2.28 -0.257 0.736
2 2 0.484 0.913 -0.873 -0.801 0.172 1.11 -1.71 0.0125 0.0450 0.374
3 3 -0.604 -0.405 0.482 0.998 -0.634 0.212 0.717 0.598 -0.876 0.139
4 4 -0.324 -1.83 0.0195 -1.70 0.506 -0.139 3.21 -0.00169 -0.200 -1.03
5 5 0.268 1.40 0.349 0.667 1.76 0.926 -1.09 -0.487 2.03 0.203
6 6 0.646 0.516 0.849 -0.619 -2.18 0.126 -0.0956 -0.471 0.0342 0.530
7 7 -1.03 -1.27 -0.0716 -2.13 -0.340 1.20 0.746 -0.366 -2.82 -0.431
8 8 0.415 0.313 0.591 -0.0552 0.132 1.86 -0.427 0.390 -0.506 -0.470
9 9 0.309 1.13 -0.472 0.760 -0.549 -0.954 -0.219 -0.653 0.335 -0.870
10 10 1.06 1.30 1.12 0.646 0.279 -1.45 -0.891 -0.278 0.637 0.236
# ... with 10 more variables: v11 <dbl>, v12 <dbl>, v13 <dbl>, v14 <dbl>, v15 <dbl>,
# v16 <dbl>, v17 <dbl>, v18 <dbl>, v19 <dbl>, v20 <dbl>
I can validate one variable using the following rule:
df %>%
confront(
validator(
num.val = is.numeric(v1),
big.val = !(v1>10),
low.val = !(v1< -10),
NA.val = !is.na(v1)
)
) %>% summary()
# name items passes fails nNA error warning expression
# 1 num.val 1 1 0 0 FALSE FALSE is.numeric(v1)
# 2 big.val 10 10 0 0 FALSE FALSE v1 <= 10
# 3 low.val 10 10 0 0 FALSE FALSE v1 >= -10
# 4 NA.val 10 10 0 0 FALSE FALSE !is.na(v1)
However, I would like to apply this rule to multiple columns using some simple notation. Unfortunately, the code below does not work.
df %>%
confront(
validator(
num.val = is.numeric(v1:v20),
big.val = !(v1:v20>10),
low.val = !(v1:v20< -10),
NA.val = !is.na(v1:v20)
)
) %>% summary()
# name items passes fails nNA error warning expression
# 1 num.val 1 1 0 0 FALSE TRUE is.numeric(v1:v20)
# 2 big.val 1 1 0 0 FALSE TRUE v1:v20 <= 10
# 3 low.val 1 1 0 0 FALSE TRUE v1:v20 >= -10
# 4 NA.val 1 1 0 0 FALSE TRUE !is.na(v1:v20)
I understand that I can always convert my data to long format.
df %>%
pivot_longer(v1:v20) %>%
confront(
validator(
num.val = is.numeric(value),
big.val = !(value>10),
low.val = !(value< -10),
NA.val = !is.na(value)
)
) %>% summary()
# name items passes fails nNA error warning expression
# 1 num.val 1 1 0 0 FALSE FALSE is.numeric(value)
# 2 big.val 200 200 0 0 FALSE FALSE value <= 10
# 3 low.val 200 200 0 0 FALSE FALSE value >= -10
# 4 NA.val 200 200 0 0 FALSE FALSE !is.na(value)
However, in this case, I will not be able to determine in which variable the validation failed.
Any suggestions on how one can easily apply one validation rule to many selected variables?
CodePudding user response:
This way is from validate::syntax, using .
to put whole data, but getting different result for num.val
. I look up to Data Validation Cookbook but I cannot find the way about select multiple columns in simple way.
df %>%
select(-id) %>%
confront(
validator(
num.val = is.numeric(.),
big.val = !(.>10),
low.val = !(.< -10),
NA.val = !is.na(.)
)
) %>% summary()
name items passes fails nNA error warning expression
1 num.val 1 0 1 0 FALSE FALSE is.numeric(.)
2 big.val 200 200 0 0 FALSE FALSE . <= 10
3 low.val 200 200 0 0 FALSE FALSE . >= -10
4 NA.val 200 200 0 0 FALSE FALSE !is.na(.)
CodePudding user response:
If we make a change in the OP's code in pivot_longer
by group_split
ing, it should work
library(purrr)
library(dplyr)
library(tidyr)
out <- df %>%
pivot_longer(v1:v20) %>%
group_split(name) %>%
map(~ .x %>% confront(
validator(
num.val = is.numeric(value),
big.val = !(value>10),
low.val = !(value< -10),
NA.val = !is.na(value)
)
) %>% summary())
-output
> out[1:4]
[[1]]
name items passes fails nNA error warning expression
1 num.val 1 1 0 0 FALSE FALSE is.numeric(value)
2 big.val 10 10 0 0 FALSE FALSE value <= 10
3 low.val 10 10 0 0 FALSE FALSE value >= -10
4 NA.val 10 10 0 0 FALSE FALSE !is.na(value)
[[2]]
name items passes fails nNA error warning expression
1 num.val 1 1 0 0 FALSE FALSE is.numeric(value)
2 big.val 10 10 0 0 FALSE FALSE value <= 10
3 low.val 10 10 0 0 FALSE FALSE value >= -10
4 NA.val 10 10 0 0 FALSE FALSE !is.na(value)
[[3]]
name items passes fails nNA error warning expression
1 num.val 1 1 0 0 FALSE FALSE is.numeric(value)
2 big.val 10 10 0 0 FALSE FALSE value <= 10
3 low.val 10 10 0 0 FALSE FALSE value >= -10
4 NA.val 10 10 0 0 FALSE FALSE !is.na(value)
[[4]]
name items passes fails nNA error warning expression
1 num.val 1 1 0 0 FALSE FALSE is.numeric(value)
2 big.val 10 10 0 0 FALSE FALSE value <= 10
3 low.val 10 10 0 0 FALSE FALSE value >= -10
4 NA.val 10 10 0 0 FALSE FALSE !is.na(value)