My dataset consists of numeric columns, with some missing (i.e. NA) values. I want to find the rows that consist entirely of NA values. For example:
library(tidyverse)
library(magrittr)
frm <- tribble(
~A, ~B, ~C,
11, 22, 33,
14, NA, 37,
10, 29, 36,
NA, NA, NA,
18, 28, 38
)
I could process each row in a for-loop, using is.na() and all(), but I'd like to find a "tidy" solution. This is the best I could do:
frm %>%
rowwise %>%
summarize(all_values_missing = is.na(A) & is.na(B) & is.na(C))
But this approach doesn't scale to datasets with lots of columns and nontrivial column-names. Any ideas would be much appreciated!
CodePudding user response:
Use the c_across
function to help. For example
frm %>%
rowwise %>%
summarize(all_values_missing = all(is.na(c_across())))
If you only need a subset of columns the c_across()
will accept tidy selectors as well.
CodePudding user response:
What do you want your return value to be?
You can use pmap_lgl
to iterate rowwise over your dataframe and return a logical vector of whether the row was all NA:
> pmap_lgl(frm, ~all(is.na(.x)))
[1] FALSE FALSE FALSE TRUE FALSE
You can embed this in your dataframe in a tidy manner via:
> frm %>%
mutate(all_na = pmap_lgl(., ~all(is.na(.x))))
# A tibble: 5 × 4
A B C all_na
<dbl> <dbl> <dbl> <lgl>
1 11 22 33 FALSE
2 14 NA 37 FALSE
3 10 29 36 FALSE
4 NA NA NA TRUE
5 18 28 38 FALSE