How can I check that a subset of columns is row-wise equivalent, while treating NA
as missing data to be ignored? The special case of a row containing exclusively NA
should return NA
library(dplyr)
library(tibble)
dat <- tribble(
~a, ~b, ~c, ~d,
4, "blue", "blue", NA,
7, NA, "red", "green",
8, "green", "green", "green",
6, "blue", NA, NA,
1, NA, NA, NA
)
dat
#> # A tibble: 5 × 4
#> a b c d
#> <dbl> <chr> <chr> <chr>
#> 1 4 blue blue <NA>
#> 2 7 <NA> red green
#> 3 8 green green green
#> 4 6 blue <NA> <NA>
#> 5 1 <NA> <NA> <NA>
e.g. with the sample data above, checking that columns b:d
are in agreement, the output would be:
expected <- tribble(
~a, ~b, ~c, ~d, ~agreement,
4, "blue", "blue", NA, TRUE,
7, NA, "red", "green", FALSE,
8, "green", "green", "green", TRUE,
6, "blue", NA, NA, TRUE,
1, NA, NA, NA, NA
)
expected
#> # A tibble: 5 × 5
#> a b c d agreement
#> <dbl> <chr> <chr> <chr> <lgl>
#> 1 4 blue blue <NA> TRUE
#> 2 7 <NA> red green FALSE
#> 3 8 green green green TRUE
#> 4 6 blue <NA> <NA> TRUE
#> 5 1 <NA> <NA> <NA> NA
Created on 2022-12-08 by the reprex package (v2.0.1)
CodePudding user response:
in two steps, we can:
- make a list-column containing a vector of the relevant columns
- use a
case_when
to test for each of the three possible outputs:TRUE
FALSE
andNA
# for `discard()`
library(purrr)
foo <- dat %>%
rowwise() %>%
mutate(ratings=list(c_across(b:d)))
foo
#> # A tibble: 5 × 5
#> # Rowwise:
#> a b c d ratings
#> <dbl> <chr> <chr> <chr> <list>
#> 1 4 blue blue <NA> <chr [3]>
#> 2 7 <NA> red green <chr [3]>
#> 3 8 green green green <chr [3]>
#> 4 6 blue <NA> <NA> <chr [3]>
#> 5 1 <NA> <NA> <NA> <chr [3]>
foo %>%
mutate(agreement=case_when(
all(is.na(ratings)) ~ NA,
length(unique(ratings %>% discard(is.na)))==1 ~ TRUE,
TRUE ~ FALSE # when neither of the above are TRUE, we don't have agreement
))
#> # A tibble: 5 × 6
#> # Rowwise:
#> a b c d ratings agreement
#> <dbl> <chr> <chr> <chr> <list> <lgl>
#> 1 4 blue blue <NA> <chr [3]> TRUE
#> 2 7 <NA> red green <chr [3]> FALSE
#> 3 8 green green green <chr [3]> TRUE
#> 4 6 blue <NA> <NA> <chr [3]> TRUE
#> 5 1 <NA> <NA> <NA> <chr [3]> NA
Created on 2022-12-08 by the reprex package (v2.0.1)
CodePudding user response:
You can check whether the parallel min and max values are equal:
library(dplyr)
dat %>%
mutate(agreement = do.call(`==`, lapply(c(pmin, pmax), do.call, c(across(b:d), na.rm = TRUE))))
# A tibble: 5 × 5
a b c d agreement
<dbl> <chr> <chr> <chr> <lgl>
1 4 blue blue NA TRUE
2 7 NA red green FALSE
3 8 green green green TRUE
4 6 blue NA NA TRUE
5 1 NA NA NA NA