Home > Blockchain >  How to filter out groups empty for 1 column in Tidyverse
How to filter out groups empty for 1 column in Tidyverse

Time:01-08

tibble(
  A = c("A","A","B","B"),
  x = c(NA,NA,NA,1),
  y = c(1,2,3,4),
) %>% group_by(A) -> df

desired output:

tibble(
  A = c("B","B"),
  x = c(NA,1)
  y = c(3,4),
)

I want to find all groups for which all elements of x and x only are all NA, then remove those groups. "B" is filtered in because it has at least 1 non NA element.

I tried:

df %>%
  filter(all(!is.na(x)))

but it seems that filters out if it finds at least 1 NA; I need the correct word, which is not all.

CodePudding user response:

This will remove groups of column A if all elements of x are NA:

library(dplyr)

df %>%
  group_by(A) %>%
  filter(! all(is.na(x)))

# A tibble: 2 × 3
# Groups:   A [1]
#  A         x     y
#  <chr> <dbl> <dbl>
#1 B        NA     3
#2 B         1     4

Note that group "A" was removed because both cells in the column x are not defined.

CodePudding user response:

We can use any with complete.cases

library(dplyr)
df %>% 
  group_by(A) %>% 
  filter(any(complete.cases(x))) %>% 
  ungroup

-output

# A tibble: 2 × 3
  A         x     y
  <chr> <dbl> <dbl>
1 B        NA     3
2 B         1     4

In the devel version of dplyr, we could use .by in filter thus we don't need to group_by/ungroup

df %>%  
  filter(any(complete.cases(x)), .by = 'A')
# A tibble: 2 × 3
  A         x     y
  <chr> <dbl> <dbl>
1 B        NA     3
2 B         1     4
  • Related