How create column that lists number of occurrences of X in another column?-CodePudding

I've got a huge df that include the following:

subsetdf <- data_frame(Id=c(1:6),TicketNo=c(15,16,15,17,17,17))

I want to add a column, GroupSize, that tells for each Id how many other Ids share the same TicketNo value. In other words, I want output like this:

TheDream <- data_frame(Id=c(1:6),TicketNo=c(15,16,15,17,17,17),GroupSize=c(2,1,2,3,3,3)

I've unsuccessfully tried:

subsetdf <- subsetdf %>%
  group_by(TicketNo) %>%
  add_count(name = "GroupSize")

I'd like to use mutate() but I can't seem to get it right.

Edit With the GroupSize column now added, I want to add a final column that looks at the values in two other columns and returns the value of whichever is higher. So I've got:

df <- data_frame(Id=c(1:6),TicketNo=c(15,16,15,17,17,17),GroupSize=c(2,1,2,3,3,3),FamilySize=c(2,2,1,1,4,4)

And I want:

df <- data_frame(Id=c(1:6),TicketNo=c(15,16,15,17,17,17),GroupSize=c(2,1,2,3,3,3),FamilySize=c(2,2,1,1,4,4),FinalSize=c(2,2,2,3,4,4)

I've unsuccessfully tried:

df <- df %>% pmax(df$GroupSize, df$FamilySize) %>% dplyr::mutate(FinalSize = n())

That attempt earns me the error: Error: ! Subscript iis a matrix, the datavalue` must have size 1. Backtrace:

... %>% dplyr::mutate(Groupsize = n())
base::pmax(., train_data$Family_size, train_data$PartySize)
tibble:::[<-.tbl_df(*tmp*, change, value = <int>)
tibble:::tbl_subassign_matrix(x, j, value, j_arg, substitute(value))`

CodePudding user response：

If we need to use mutate use n() to get the group size. Also, make sure that the mutate is from dplyr (as there is also a plyr::mutate which could mask the function if it is loaded later)

library(dplyr)
subsetdf %>%
   group_by(TicketNo) %>%
   dplyr::mutate(GroupSize = n())