Home > Mobile >  Filter rows by groups where all values are NA
Filter rows by groups where all values are NA

Time:12-04

I have difficulty removing rows in grouped data where all values are NA:

df <- data.frame(
  File = c("A", "A", "A", "B", "B", "B", "C", "C", "C"),
  value = c(1.111, 1.222, 1.121, 3.325, NA, 3.455, NA, NA, NA)
)

Filtering with if_any on the condition that any value is NA works as expected:

library(dplyr)
df %>%
#  group_by(File) %>%
  filter(!if_any(value, is.na))
  File value
1    A 1.111
2    A 1.222
3    A 1.121
4    B 3.325
5    B 3.455

But filtering with if_all on the condition that all values of a group be NAdoes not work as expected - I get the same result as for if_any:

df %>%
  group_by(File) %>%
  filter(!if_all(value, is.na))
# A tibble: 5 × 2
# Groups:   File [2]
  File  value
  <chr> <dbl>
1 A      1.11
2 A      1.22
3 A      1.12
4 B      3.32
5 B      3.46

The expected result is:

1 A      1.11
2 A      1.22
3 A      1.12

Where am I going wrong?

CodePudding user response:

is this what you are looking for:

df %>%
  group_by(File) %>%
  filter(all(!is.na(value)))

You group by File and only return groups rows where there are no NA values. This matches your expected output.

But in your question you state that you only want to remove rows where all values are NA. You could achieve that with

df %>%
  group_by(File) %>%
  filter(any(!is.na(value)))

if_any and if_all are intended to use within filer when you want to filter across multiple columns. This is not what you want I guess. Here is an example on how you would use if_all:

df2 <- tibble(
  x1 = c(TRUE, TRUE, FALSE, FALSE),
  x2 = c(TRUE, FALSE, TRUE, FALSE),
)


df2 %>%
  filter(if_all(x1:x2, ~.)) #rows where x1 and x2 are TRUE

df2 %>%
  filter(if_any(x1:x2, ~.)) #rows where x1 or x2 are TRUE

CodePudding user response:

This was my approach :

my_df <- data.frame(
File = c("A", "A", "A", "B", "B", "B", "C", "C", "C"),
value = c(1.111, 1.222, 1.121, 3.325, NA, 3.455, NA, NA, NA)
)

my_df2 <- my_df %>% group_by(File) %>% filter(n() == sum(!is.na(value)))

Edit : My bad I thought you wanted only group where the value was equal to NA but it was the opposite so by adding the ! before the is.na will give you the expected result

  • Related