I am trying to count the number of species per region which have missing data (NA) for a selection of variables.
Here is an example of my dataframe:
df <- structure(
list(
ID = c("AL01", "AL01", "AL02", "AL02", "AL03", "AL03"),
Species = c("Sp1",
"Sp2",
"Sp3",
"Sp4",
"Sp5",
"Sp6"),
Var1 = c("A", NA, NA, NA, "B", "B"),
Var2 = c(NA,
"A",
"B",
"C",
"B",
"C"),
Var3 = c(NA,
2.71, 2.86, 3.21, 2.87, 3.05),
Var4 = c("S", NA,
"C", NA, "S",
"C")
),
class = "data.frame",
row.names = c(NA,
6L)
)
I can get the count of species with NA for any of Var2, Var3 of Var4 by running:
df %>%
filter_at(
vars(
Var2,
Var3,
Var4
),
any_vars(is.na(.))
) %>%
group_by(ID) %>%
count()
# A tibble: 2 × 2
# Groups: ID [2]
ID n
<chr> <int>
1 AL01 2
2 AL02 1
However this only shows me AL01 and AL02 and I would also like to include AL03 for which the count is 0. I have tried this code which I thought should work:
df %>%
group_by(ID) %>%
summarise_at(vars(
Var2,
Var3,
Var4
), ~ sum(any_vars(is.na(.))))
But I get this error:
Error in `summarise()`:
! Problem while computing `Var2 = (structure(function (..., .x = ..1, .y = ..2, . =
..1) ...`.
ℹ The error occurred in group 1: ID = "AL01".
Caused by error in `abort_quosure_op()`:
! Summary operations are not defined for quosures. Do you need to unquote the
quosure?
# Bad: sum(myquosure)
# Good: sum(!!myquosure)
Run `rlang::last_error()` to see where the error occurred.
I realise I am not sure exactly how any_vars works and am unclear on how to continue. The output I would like would be:
# A tibble: 2 × 2
# Groups: ID [2]
ID n
<chr> <int>
1 AL01 2
2 AL02 1
3 AL03 0
CodePudding user response:
You can do:
library(tidyverse)
df %>%
mutate(missing = apply(across(num_range('Var', 2:4)), 1, function(x) any(is.na(x)))) %>%
group_by(ID) %>%
summarize(n = sum(missing))
# A tibble: 3 x 2
ID n
<chr> <int>
1 AL01 2
2 AL02 1
3 AL03 0
CodePudding user response:
df %>%
rowwise() %>%
mutate(across(num_range('Var', 2:4), is.na),
x = any(c_across(num_range('Var', 2:4)))) %>%
group_by(ID) %>%
summarise(n = sum(x))
# A tibble: 3 × 2
ID n
<chr> <int>
1 AL01 2
2 AL02 1
3 AL03 0