I want to get a count of how many times each Player's name appears in my data frame ph1
. My code is showing how many times each name is in the dataset, but I cannot group it by n
so I can see how many times each Player's name is in the dataset.
I would prefer a dplyr solution but am open to others.
For example:
n Number_Players
1 21
2 2
Code
ph1 %>%
filter(!is.na(Player)) %>%
group_by(Player) %>%
mutate(n= n()) %>%
relocate(n, .after=Player)
Sample of how my output is appearing
Player n Year PA AB
Peavy, Jake 1 2008 58 49
Gallardo, Yovani 1 2014 59 53
Stratton, Chris 1 2018 50 43
Cashner, Andrew 1 2015 61 60
Anderson, Chase 1 2016 52 45
Burnes, Corbin 1 2021 59 52
Wolf, Randy 1 2009 80 67
Davies, Zach 2 2016 61 53
Senzatela, Antonio 1 2021 51 39
Syndergaard, Noah 2 2015 50 43
Sample data
structure(list(Player = c("Peavy, Jake", "Gallardo, Yovani",
"Stratton, Chris", "Cashner, Andrew", "Anderson, Chase", "Burnes, Corbin",
"Wolf, Randy", "Davies, Zach", "Senzatela, Antonio", "Syndergaard, Noah",
"Syndergaard, Noah", "Davies, Zach", "Sánchez, Aníbal", "Hudson, Dakota",
"Leake, Mike", "De La Rosa, Jorge", "Cueto, Johnny", "González, Gio",
"Peralta, Wily", "Vólquez, Edinson", "Ryu, Hyun Jin", "Dempster, Ryan",
"Holland, Derek", "Wheeler, Zack", "Sabathia, CC"), Year = c(2008,
2014, 2018, 2015, 2016, 2021, 2009, 2016, 2021, 2015, 2016, 2019,
2019, 2019, 2016, 2015, 2008, 2015, 2014, 2014, 2014, 2009, 2018,
2014, 2008), PA = c(58, 59, 50, 61, 52, 59, 80, 61, 51, 50, 67,
54, 56, 59, 55, 50, 56, 56, 68, 64, 56, 75, 56, 62, 50), AB = c(49,
53, 43, 60, 45, 52, 67, 53, 39, 43, 58, 51, 52, 51, 49, 48, 45,
43, 57, 53, 47, 64, 53, 50, 48)), row.names = c(NA, -25L), class = c("tbl_df",
"tbl", "data.frame"))
I checked the existing stackoverflow solutions, but did not see one that matches my need. If one exists, I would appreciate it if you could provide its link.
CodePudding user response:
We may use count
to return the frequency count and then do a grouping on the count to do the count
or n()
again
library(dplyr)
ph1 %>%
filter(!is.na(Player)) %>%
count(Player) %>%
group_by(n) %>%
summarise(Number_Players = n())
-output
# A tibble: 2 × 2
n Number_Players
<int> <int>
1 1 21
2 2 2
or in base R
, we can use table
twice
stack(table(table(ph1$Player)))[2:1]
CodePudding user response:
ph %>%
filter(!is.na(Player)) %>%
group_by(Player) %>% count() %>%
group_by(n) %>%
count(name="Number_Players")
# A tibble: 2 x 2
# Groups: n [2]
n Number_Players
<int> <int>
1 1 21
2 2 2
CodePudding user response:
Here is another dplyr
way:
First we use add_count
to add the count as a new column,
then we have to consider the duplets with distinct
and finally we can apply count
on n
(that was generated by add_count
)
ph1 %>%
add_count(Player) %>%
distinct(Player, .keep_all = TRUE) %>%
count(n, name= "Number_Players")
n Number_Players
<int> <int>
1 1 21
2 2 2