Home > database >  How to obtain maximum counts by group
How to obtain maximum counts by group

Time:11-28

Using tidyverse, I would like to obtain the maximum count of events (e.g., dates) by group. Here is a minimum reproducible example:

Data frame:

df <- data.frame(id = c(1, 1, 1, 2, 2, 2, 2, 3, 3, 4, 5, 5, 5, 5),
                 event = c(12, 6, 1, 7, 13, 9, 4, 8, 2, 5, 11, 3, 10, 14))

The following code produces the desired output, but seems overly complicated:

df %>% 
  group_by(id) %>% 
  mutate(count = n()) %>% 
  ungroup() %>% 
  select(count) %>% 
  slice_max(count, n = 1, with_ties = FALSE)

Is there a simpler/better way? The following works, but top_n has been superseded by slice_max and it is recommended that the latter be used instead.

df %>%
count(id) %>% 
distinct(n) %>% # to remove tied values 
top_n(1)

Any suggestions?

CodePudding user response:

If you want something with fewer steps, you could try base R table() to get the counts in a vector and then take the max(). By default it returns the max value only once even if it appears a few times in the vector.

max(table(df$id))
[1] 4

Or if you want it in tidyverse style

df$id %>% 
  table() %>% 
  max()

CodePudding user response:

It's not very clear what is the role of the values in the event column with respect to your question.

If you want the maximum number of events by group (where id is the grouping variable), then:

df %>% 
  group_by(id) %>% 
  summarise(max_n_events = max(event))

If instead you basically do not consider the specific values in the event column and only look at the id column, the solution proposed by @Josh above can also be written as follows:

df %>% group_by(id) %>% count() %>% ungroup() %>% summarise(max(n))
  • Related