Can someone help me understand what the grouping is doing here, please?
Why do these two produce two different grouped outputs? The top returns all grouped variables where n() >1
in results A and outside of A category but just the A pairing while the bottom returns n() > 1
here duplicates exist in only A.
Sample Data:
df <- data.frame(ID = c(1,1,3,4,5,6,6),
Acronym = c('A','B','A','A','B','A','A')
)
df %>%
group_by(ID) %>%#
filter(Acronym == 'A',n() > 1)
df %>% filter(Acronym == 'A') %>%
group_by(ID) %>%
filter(n() > 1)
CodePudding user response:
In the first example, rows with Acroynm == "A"
are in the data frame and contribute to the row count n()
.
In the second example, these rows are removed, and don't contribute to row count from n()
.
CodePudding user response:
If we want the first case to return only 'ID' 6, use sum
to get the count of 'A' values in Acronym
library(dplyr)
df %>%
group_by(ID) %>%
filter(sum(Acronym == 'A') > 1)
As mentioned in the other post, it is just that n()
is based on the whole group count and not on the number of 'A's. If we are unsure about the filter
, create a column with mutate
and check the output
df %>%
group_by(ID) %>%
mutate(ind = Acronym == 'A' & n() > 1)
# A tibble: 7 × 3
# Groups: ID [5]
ID Acronym ind
<dbl> <chr> <lgl>
1 1 A TRUE
2 1 B FALSE
3 3 A FALSE
4 4 A FALSE
5 5 B FALSE
6 6 A TRUE
7 6 A TRUE