Home > database >  Cannot group data by a column in R
Cannot group data by a column in R

Time:02-14

I want to get a count of how many times each Player's name appears in my data frame ph1. My code is showing how many times each name is in the dataset, but I cannot group it by nso I can see how many times each Player's name is in the dataset.

I would prefer a dplyr solution but am open to others.

For example:

n Number_Players
1  21
2  2

Code

ph1 %>%
     filter(!is.na(Player)) %>%
     group_by(Player) %>% 
     mutate(n= n()) %>%
     relocate(n, .after=Player)

Sample of how my output is appearing

Player              n   Year    PA  AB
Peavy, Jake         1   2008    58  49
Gallardo, Yovani    1   2014    59  53
Stratton, Chris     1   2018    50  43
Cashner, Andrew     1   2015    61  60
Anderson, Chase     1   2016    52  45
Burnes, Corbin      1   2021    59  52
Wolf, Randy         1   2009    80  67
Davies, Zach        2   2016    61  53
Senzatela, Antonio  1   2021    51  39
Syndergaard, Noah   2   2015    50  43

Sample data

structure(list(Player = c("Peavy, Jake", "Gallardo, Yovani", 
"Stratton, Chris", "Cashner, Andrew", "Anderson, Chase", "Burnes, Corbin", 
"Wolf, Randy", "Davies, Zach", "Senzatela, Antonio", "Syndergaard, Noah", 
"Syndergaard, Noah", "Davies, Zach", "Sánchez, Aníbal", "Hudson, Dakota", 
"Leake, Mike", "De La Rosa, Jorge", "Cueto, Johnny", "González, Gio", 
"Peralta, Wily", "Vólquez, Edinson", "Ryu, Hyun Jin", "Dempster, Ryan", 
"Holland, Derek", "Wheeler, Zack", "Sabathia, CC"), Year = c(2008, 
2014, 2018, 2015, 2016, 2021, 2009, 2016, 2021, 2015, 2016, 2019, 
2019, 2019, 2016, 2015, 2008, 2015, 2014, 2014, 2014, 2009, 2018, 
2014, 2008), PA = c(58, 59, 50, 61, 52, 59, 80, 61, 51, 50, 67, 
54, 56, 59, 55, 50, 56, 56, 68, 64, 56, 75, 56, 62, 50), AB = c(49, 
53, 43, 60, 45, 52, 67, 53, 39, 43, 58, 51, 52, 51, 49, 48, 45, 
43, 57, 53, 47, 64, 53, 50, 48)), row.names = c(NA, -25L), class = c("tbl_df", 
"tbl", "data.frame"))

I checked the existing stackoverflow solutions, but did not see one that matches my need. If one exists, I would appreciate it if you could provide its link.

CodePudding user response:

We may use count to return the frequency count and then do a grouping on the count to do the count or n() again

library(dplyr)
ph1 %>% 
 filter(!is.na(Player)) %>%
 count(Player) %>% 
 group_by(n) %>% 
 summarise(Number_Players = n())

-output

# A tibble: 2 × 2
      n Number_Players
  <int>          <int>
1     1             21
2     2              2

or in base R, we can use table twice

stack(table(table(ph1$Player)))[2:1]

CodePudding user response:

ph %>% 
  filter(!is.na(Player)) %>%
  group_by(Player) %>% count() %>%
  group_by(n) %>%
  count(name="Number_Players") 

# A tibble: 2 x 2
# Groups:   n [2]
      n Number_Players
  <int>          <int>
1     1             21
2     2              2

CodePudding user response:

Here is another dplyr way:

First we use add_count to add the count as a new column, then we have to consider the duplets with distinct and finally we can apply count on n (that was generated by add_count)

ph1 %>% 
  add_count(Player) %>% 
  distinct(Player, .keep_all = TRUE) %>% 
  count(n, name= "Number_Players")
      n Number_Players
  <int>          <int>
1     1             21
2     2              2
  • Related