Home > Back-end >  Groupby and filter output is producing different results
Groupby and filter output is producing different results

Time:08-19

Can someone help me understand what the grouping is doing here, please? Why do these two produce two different grouped outputs? The top returns all grouped variables where n() >1 in results A and outside of A category but just the A pairing while the bottom returns n() > 1 here duplicates exist in only A.

Sample Data:

df <- data.frame(ID = c(1,1,3,4,5,6,6),
                 Acronym = c('A','B','A','A','B','A','A')
                 )
df %>% 
  group_by(ID) %>%# 
  filter(Acronym == 'A',n() > 1)

df %>% filter(Acronym == 'A') %>% 
  group_by(ID) %>%
  filter(n() > 1)

CodePudding user response:

In the first example, rows with Acroynm == "A" are in the data frame and contribute to the row count n().

In the second example, these rows are removed, and don't contribute to row count from n().

CodePudding user response:

If we want the first case to return only 'ID' 6, use sum to get the count of 'A' values in Acronym

library(dplyr)
df %>% 
   group_by(ID) %>%
   filter(sum(Acronym == 'A') > 1)

As mentioned in the other post, it is just that n() is based on the whole group count and not on the number of 'A's. If we are unsure about the filter, create a column with mutate and check the output

df %>%
    group_by(ID) %>%
    mutate(ind = Acronym == 'A' & n() > 1)
# A tibble: 7 × 3
# Groups:   ID [5]
     ID Acronym ind  
  <dbl> <chr>   <lgl>
1     1 A       TRUE 
2     1 B       FALSE
3     3 A       FALSE
4     4 A       FALSE
5     5 B       FALSE
6     6 A       TRUE 
7     6 A       TRUE 
  • Related