Home > OS >  How to count rows with NA values across a selection of columns and include 0 count?
How to count rows with NA values across a selection of columns and include 0 count?

Time:07-03

I am trying to count the number of species per region which have missing data (NA) for a selection of variables.

Here is an example of my dataframe:

df <- structure(
  list(
    ID = c("AL01", "AL01", "AL02", "AL02", "AL03", "AL03"),
    Species = c("Sp1",
                "Sp2",
                "Sp3",
                "Sp4",
                "Sp5",
                "Sp6"),
    Var1 = c("A", NA, NA, NA, "B", "B"),
    Var2 = c(NA,
                  "A",
                  "B",
                  "C",
                  "B",
                  "C"),
    Var3 = c(NA,
             2.71, 2.86, 3.21, 2.87, 3.05),
    Var4 = c("S", NA,
             "C", NA, "S",
             "C")
  ),
  class = "data.frame",
  row.names = c(NA,
                6L)
)

I can get the count of species with NA for any of Var2, Var3 of Var4 by running:

df %>% 
  filter_at(
    vars(
      Var2,
      Var3,
      Var4
    ),
    any_vars(is.na(.))
  ) %>% 
  group_by(ID) %>% 
  count()


# A tibble: 2 × 2
# Groups:   ID [2]
  ID        n
  <chr> <int>
1 AL01      2
2 AL02      1

However this only shows me AL01 and AL02 and I would also like to include AL03 for which the count is 0. I have tried this code which I thought should work:

df %>%
  group_by(ID) %>% 
  summarise_at(vars(
    Var2,
    Var3,
    Var4
  ), ~ sum(any_vars(is.na(.))))

But I get this error:

Error in `summarise()`:
! Problem while computing `Var2 = (structure(function (..., .x = ..1, .y = ..2, . =
  ..1) ...`.
ℹ The error occurred in group 1: ID = "AL01".
Caused by error in `abort_quosure_op()`:
! Summary operations are not defined for quosures. Do you need to unquote the
  quosure?

# Bad: sum(myquosure)

# Good: sum(!!myquosure)
Run `rlang::last_error()` to see where the error occurred.

I realise I am not sure exactly how any_vars works and am unclear on how to continue. The output I would like would be:

# A tibble: 2 × 2
# Groups:   ID [2]
  ID        n
  <chr> <int>
1 AL01      2
2 AL02      1
3 AL03      0

CodePudding user response:

You can do:

library(tidyverse)
df %>%
  mutate(missing = apply(across(num_range('Var', 2:4)), 1, function(x) any(is.na(x)))) %>%
  group_by(ID) %>%
  summarize(n = sum(missing))


# A tibble: 3 x 2
  ID        n
  <chr> <int>
1 AL01      2
2 AL02      1
3 AL03      0

CodePudding user response:

df %>% 
  rowwise() %>% 
  mutate(across(num_range('Var', 2:4), is.na), 
         x = any(c_across(num_range('Var', 2:4)))) %>% 
  group_by(ID) %>% 
  summarise(n = sum(x))

# A tibble: 3 × 2
  ID        n
  <chr> <int>
1 AL01      2
2 AL02      1
3 AL03      0
  • Related