Home > Net >  Is there a way to filter out an entire group from a tibble based on the rows within that group?
Is there a way to filter out an entire group from a tibble based on the rows within that group?

Time:11-10

If I have a tibble where each row represents a component of some object and multiple components share and object, is there a way to analyze all of a given objects components and remove its corresponding rows if it doesn't match some condition?

For example, lets say I want to clean up the table below

tib <- tibble(object = c("a", "a", "a", "a", "b", "b", "b"),
       component = c("x", "x", "y", "z", "x", "y", "y"),
       data = 1:7)

I know an object must contain exactly one component "x" and thus object "a" is not valid because it has two. So all four rows corresponding with object "a" need to be removed.

I know that the filter() function can work on whole groups but I'm struggling to find a way to analyze the group within the filter function. The closest I think I've come is below but it doesn't work at all. Maybe I'm completely off.

tib %>%
  group_by("object") %>%
  filter(count(cur_data(), component)$b != 1)

CodePudding user response:

We can check for the condition sum(component == "x") < 2 in each group:

library(dplyr)

tib %>% 
  group_by(object) %>% 
  filter(sum(component == "x") < 2)

#> # A tibble: 3 x 3
#> # Groups:   object [1]
#>   object component  data
#>   <chr>  <chr>     <int>
#> 1 b      x             5
#> 2 b      y             6
#> 3 b      z             7

Alternatively, we can use unlist(table(component))["x]" to see how often component == "x" occurs in each group. Then we can filter those groups where this condition == 1. This approach is more flexible, when we want to check the occurrence of more than one variable.

library(dplyr)

tib %>% 
  group_by(object) %>% 
  filter(unlist(table(component))["x"] == 1L) 

#> # A tibble: 3 x 3
#> # Groups:   object [1]
#>   object component  data
#>   <chr>  <chr>     <int>
#> 1 b      x             5
#> 2 b      y             6
#> 3 b      z             7

Created on 2022-11-10 by the reprex package (v2.0.1)

  • Related