How do I get the most frequent entry in R?
For example if I would have data in two columns:
Name-City
A-New York
A-New York
A-Montreal
A-New York
B-Chicago
B-Chicago
B-New York
B-Detroit
I would like to get a dataframe with:
Name-City
A-New York
B-Chicago
So it should have every unique entry in "Name" with the most frequent entry in "City".
My idea would be something like:
df %>%
group_by(Name) %>%
count(City)
CodePudding user response:
Using mtcars
, let's say we want to find the most frequent number of gears for each number of cylinders:
library(tidyverse)
mtcars %>% group_by(cyl, gear) %>%
summarize(count = n()) %>%
filter(count == max(count))
# A tibble: 3 x 3
# Groups: cyl [3]
cyl gear count
<dbl> <dbl> <int>
1 4 4 8
2 6 4 4
3 8 3 12
First we need to count how many occurrences of each cylinder/gear combination there are. Then we can filter to the largest (i.e. most frequent) occurrence.
CodePudding user response:
library(dplyr)
df %>%
group_by(Name) %>%
count(City) %>%
top_n(1)
# Selecting by n
# # A tibble: 2 x 3
# # Groups: Name [2]
# Name City n
# <chr> <chr> <int>
# 1 A New York 3
# 2 B Chicago 2