Home > Blockchain >  Filter by value counts within groups
Filter by value counts within groups

Time:01-09

I want to filter my grouped dataframe based on the number of occurrences of a specific value within a group.

Some exemplary data:

data <- data.frame(ID = sample(c("A","B","C","D"),100,replace = T), 
                 rt = runif(100,0.2,1),
                 lapse = sample(1:2,100,replace = T))

The “lapse” column is my filter variable in this case. I want to exclude every “ID” group that has more than 15 counts of “lapse” == 2 within!

data %>% group_by(ID) %>% count(lapse == 2)

So, if for example the group “A” has 17 times “lapse” == 2 within it should be filtered entirely from the datafame.

CodePudding user response:

First I created some reproducible data using a set.seed and check the number of values per group. It seems that in this case only group D more values with lapse 2 has. You can use filter and sum the values with lapse 2 per group like this:

set.seed(7)
data <- data.frame(ID = sample(c("A","B","C","D"),100,replace = T), 
                   rt = runif(100,0.2,1),
                   lapse = sample(1:2,100,replace = T))

library(dplyr)
# Check n values per group
data %>%
  group_by(ID, lapse) %>%
  summarise(n = n())
#> # A tibble: 8 × 3
#> # Groups:   ID [4]
#>   ID    lapse     n
#>   <chr> <int> <int>
#> 1 A         1     8
#> 2 A         2     7
#> 3 B         1    13
#> 4 B         2    15
#> 5 C         1    18
#> 6 C         2     6
#> 7 D         1    17
#> 8 D         2    16

data %>%
  group_by(ID) %>%
  filter(!(sum(lapse ==  2) > 15))
#> # A tibble: 67 × 3
#> # Groups:   ID [3]
#>    ID       rt lapse
#>    <chr> <dbl> <int>
#>  1 B     0.517     2
#>  2 C     0.589     1
#>  3 C     0.598     2
#>  4 C     0.715     1
#>  5 B     0.475     2
#>  6 C     0.965     1
#>  7 B     0.234     1
#>  8 B     0.812     2
#>  9 C     0.517     1
#> 10 B     0.700     1
#> # … with 57 more rows

Created on 2023-01-08 with reprex v2.0.2

  • Related