Using tidyverse, I would like to obtain the maximum count of events (e.g., dates) by group. Here is a minimum reproducible example:
Data frame:
df <- data.frame(id = c(1, 1, 1, 2, 2, 2, 2, 3, 3, 4, 5, 5, 5, 5),
event = c(12, 6, 1, 7, 13, 9, 4, 8, 2, 5, 11, 3, 10, 14))
The following code produces the desired output, but seems overly complicated:
df %>%
group_by(id) %>%
mutate(count = n()) %>%
ungroup() %>%
select(count) %>%
slice_max(count, n = 1, with_ties = FALSE)
Is there a simpler/better way? The following works, but top_n
has been superseded by slice_max
and it is recommended that the latter be used instead.
df %>%
count(id) %>%
distinct(n) %>% # to remove tied values
top_n(1)
Any suggestions?
CodePudding user response:
If you want something with fewer steps, you could try base R table()
to get the counts in a vector and then take the max()
. By default it returns the max value only once even if it appears a few times in the vector.
max(table(df$id))
[1] 4
Or if you want it in tidyverse style
df$id %>%
table() %>%
max()
CodePudding user response:
It's not very clear what is the role of the values in the event
column with respect to your question.
If you want the maximum number of events by group (where id
is the grouping variable), then:
df %>%
group_by(id) %>%
summarise(max_n_events = max(event))
If instead you basically do not consider the specific values in the event
column and only look at the id
column, the solution proposed by @Josh above can also be written as follows:
df %>% group_by(id) %>% count() %>% ungroup() %>% summarise(max(n))