Home > OS >  How to check if multiple columns are row-wise equivalent, ignoring NA values
How to check if multiple columns are row-wise equivalent, ignoring NA values

Time:12-09

How can I check that a subset of columns is row-wise equivalent, while treating NA as missing data to be ignored? The special case of a row containing exclusively NA should return NA

library(dplyr)
library(tibble)

dat <- tribble(
  ~a, ~b, ~c, ~d,
  4, "blue", "blue", NA,
  7, NA, "red", "green",
  8, "green", "green", "green",
  6, "blue", NA, NA,
  1, NA, NA, NA
)

dat
#> # A tibble: 5 × 4
#>       a b     c     d    
#>   <dbl> <chr> <chr> <chr>
#> 1     4 blue  blue  <NA> 
#> 2     7 <NA>  red   green
#> 3     8 green green green
#> 4     6 blue  <NA>  <NA> 
#> 5     1 <NA>  <NA>  <NA>

e.g. with the sample data above, checking that columns b:d are in agreement, the output would be:

expected <- tribble(
  ~a, ~b, ~c, ~d, ~agreement,
  4, "blue", "blue", NA, TRUE,
  7, NA, "red", "green", FALSE,
  8, "green", "green", "green", TRUE,
  6, "blue", NA, NA, TRUE,
  1, NA, NA, NA, NA
)

expected
#> # A tibble: 5 × 5
#>       a b     c     d     agreement
#>   <dbl> <chr> <chr> <chr> <lgl>    
#> 1     4 blue  blue  <NA>  TRUE     
#> 2     7 <NA>  red   green FALSE    
#> 3     8 green green green TRUE     
#> 4     6 blue  <NA>  <NA>  TRUE     
#> 5     1 <NA>  <NA>  <NA>  NA

Created on 2022-12-08 by the reprex package (v2.0.1)

CodePudding user response:

in two steps, we can:

  1. make a list-column containing a vector of the relevant columns
  2. use a case_when to test for each of the three possible outputs: TRUE FALSE and NA
# for `discard()`
library(purrr)

foo <- dat %>% 
 rowwise() %>%
 mutate(ratings=list(c_across(b:d)))

foo
#> # A tibble: 5 × 5
#> # Rowwise: 
#>       a b     c     d     ratings  
#>   <dbl> <chr> <chr> <chr> <list>   
#> 1     4 blue  blue  <NA>  <chr [3]>
#> 2     7 <NA>  red   green <chr [3]>
#> 3     8 green green green <chr [3]>
#> 4     6 blue  <NA>  <NA>  <chr [3]>
#> 5     1 <NA>  <NA>  <NA>  <chr [3]>

foo %>%
  mutate(agreement=case_when(
    all(is.na(ratings)) ~ NA,
    length(unique(ratings %>% discard(is.na)))==1 ~ TRUE,
    TRUE ~ FALSE # when neither of the above are TRUE, we don't have agreement
  ))
#> # A tibble: 5 × 6
#> # Rowwise: 
#>       a b     c     d     ratings   agreement
#>   <dbl> <chr> <chr> <chr> <list>    <lgl>    
#> 1     4 blue  blue  <NA>  <chr [3]> TRUE     
#> 2     7 <NA>  red   green <chr [3]> FALSE    
#> 3     8 green green green <chr [3]> TRUE     
#> 4     6 blue  <NA>  <NA>  <chr [3]> TRUE     
#> 5     1 <NA>  <NA>  <NA>  <chr [3]> NA

Created on 2022-12-08 by the reprex package (v2.0.1)

CodePudding user response:

You can check whether the parallel min and max values are equal:

library(dplyr)

dat %>%
  mutate(agreement = do.call(`==`, lapply(c(pmin, pmax), do.call, c(across(b:d), na.rm = TRUE))))

# A tibble: 5 × 5
      a b     c     d     agreement
  <dbl> <chr> <chr> <chr> <lgl>    
1     4 blue  blue  NA    TRUE     
2     7 NA    red   green FALSE    
3     8 green green green TRUE     
4     6 blue  NA    NA    TRUE     
5     1 NA    NA    NA    NA        
  • Related