Home > OS >  Another "how to deal with NAs in logical statements" question
Another "how to deal with NAs in logical statements" question

Time:03-10

Short version: I need is to get a results column r like this, ideally using dplyr (but happy for base R as well):

d <- tibble(c1 = c(T,T,F,T,F,NA), c2 = c(T,F,F,F,F,NA), c3 = c(T,F,F,NA,NA,NA))
d %>% rowwise() %>% mutate(r = something())
# A tibble: 6 x 3
  c1    c2    c3    r
  <lgl> <lgl> <lgl> <lgl>
1 TRUE  TRUE  TRUE  TRUE
2 TRUE  FALSE FALSE TRUE
3 FALSE FALSE FALSE FALSE
4 TRUE  FALSE NA    TRUE
5 FALSE FALSE NA    FALSE
6 NA    NA    NA    NA

I understand why NA|FALSE == NA. Each TRUE/FALSE in this table is the result of a comparison, and I would really like to keep the syntax as short as possible.

Long version: I have survey results, and need to create a summary of three questions asking for the primary, secondary and tertiary 'route to something' (there are more than 3 levels in reality). The summary should tell me, for each respondent, whether they made use of route A, route B, etc. Not all respondents filled in all questions, so there might be NAs. Some respondents didn't answer any of the question at all, and their summary should be NA. So I have:

df <- tibble(primary   = c("C", "A", "B", "D", NA),
             secondary = c("B", "D", "C", NA,  NA),
             tertiary  = c("A", "E", NA,  NA,  NA))

# I think I need something along these lines:
df <- df %>% rowwise() %>%
  mutate(
    routeA = (primary == "A") | (secondary == "A") | (tertiary == "A") ...
    routeB = ....
  )
# Result expected
df
# A tibble:
primary secondary tertiary routeA routeB ...
<chr>   <chr>     <chr>    <lgl>  <lgl>
C       B         A        TRUE   TRUE
A       D         E        TRUE   FALSE
B       C         NA       FALSE  TRUE
D       NA        NA       FALSE  FALSE
NA      NA        NA       NA     NA

CodePudding user response:

In base R, you can do this relatively efficiently with apply-match:

f <- function(x, levels) {
    if (all(is.na(x))) {
        rep.int(NA, length(levels))
    } else {
        as.logical(match(levels, x, 0L))
    }
}

lvs <- LETTERS[1:4]
df[paste0("route", lvs)] <- t(apply(df, 1L, f, levels = lvs))
df
## # A tibble: 5 × 7
##   primary secondary tertiary routeA routeB routeC routeD
##   <chr>   <chr>     <chr>    <lgl>  <lgl>  <lgl>  <lgl> 
## 1 C       B         A        TRUE   TRUE   TRUE   FALSE 
## 2 A       D         E        TRUE   FALSE  FALSE  TRUE  
## 3 B       C         NA       FALSE  TRUE   TRUE   FALSE 
## 4 D       NA        NA       FALSE  FALSE  FALSE  TRUE  
## 5 NA      NA        NA       NA     NA     NA     NA    

I say "relatively" because rowwise operations on data frames tend not to be very efficient, requiring coercions to and from matrix or reshaping to and from long format. (This case is no exception, as apply coerces from data frame to matrix and the assignment coerces from matrix back to data frame.)

CodePudding user response:

Suboptimal:

my_match <- function(x, val) {
   if (all(is.na(x))) return(NA)
   return(any(na.omit(x) == val))
}

df %>% rowwise() %>% mutate(rA = my_match(c_across(where(is.character)), "A"),
                            rB = my_match(c_across(where(is.character)), "B"))

To be improved:

  • this won't scale well to larger numbers of routes
  • too much repeated code (another way of saying the same thing) — but I'm not quite sure how to create a function/shortcut version of this (could loop over the possible sites adding one column at a time, but I don't feel like going quite as far as necessary down the rlang/tidy-evaluation/NSE rabbit hole right now ...)

CodePudding user response:

As mentioned in the comments, this is straightforward when the data is reshaped to long format and then back to wide.

library(tidyr)
library(dplyr)
library(tibble)

df <- df %>%
  rowid_to_column() 

df %>%
  pivot_longer(-rowid) %>%
  filter(!is.na(value)) %>%
  pivot_wider(id_cols = rowid, names_from = value, values_fill = FALSE, values_fn = ~ TRUE, names_sort = TRUE) %>%
  left_join(df, ., by  = "rowid")

# A tibble: 5 x 9
  rowid primary secondary tertiary A     B     C     D     E    
  <int> <chr>   <chr>     <chr>    <lgl> <lgl> <lgl> <lgl> <lgl>
1     1 C       B         A        TRUE  TRUE  TRUE  FALSE FALSE
2     2 A       D         E        TRUE  FALSE FALSE TRUE  TRUE 
3     3 B       C         NA       FALSE TRUE  TRUE  FALSE FALSE
4     4 D       NA        NA       FALSE FALSE FALSE TRUE  FALSE
5     5 NA      NA        NA       NA    NA    NA    NA    NA   
  • Related