I've got a huge df that include the following:
subsetdf <- data_frame(Id=c(1:6),TicketNo=c(15,16,15,17,17,17))
I want to add a column, GroupSize
, that tells for each Id
how many other Id
s share the same TicketNo
value. In other words, I want output like this:
TheDream <- data_frame(Id=c(1:6),TicketNo=c(15,16,15,17,17,17),GroupSize=c(2,1,2,3,3,3)
I've unsuccessfully tried:
subsetdf <- subsetdf %>%
group_by(TicketNo) %>%
add_count(name = "GroupSize")
I'd like to use mutate()
but I can't seem to get it right.
Edit
With the GroupSize
column now added, I want to add a final column that looks at the values in two other columns and returns the value of whichever is higher. So I've got:
df <- data_frame(Id=c(1:6),TicketNo=c(15,16,15,17,17,17),GroupSize=c(2,1,2,3,3,3),FamilySize=c(2,2,1,1,4,4)
And I want:
df <- data_frame(Id=c(1:6),TicketNo=c(15,16,15,17,17,17),GroupSize=c(2,1,2,3,3,3),FamilySize=c(2,2,1,1,4,4),FinalSize=c(2,2,2,3,4,4)
I've unsuccessfully tried:
df <- df %>% pmax(df$GroupSize, df$FamilySize) %>% dplyr::mutate(FinalSize = n())
That attempt earns me the error: Error: ! Subscript
iis a matrix, the data
value` must have size 1.
Backtrace:
- ... %>% dplyr::mutate(Groupsize = n())
- base::pmax(., train_data$Family_size, train_data$PartySize)
- tibble:::
[<-.tbl_df
(*tmp*
, change, value =<int>
) - tibble:::tbl_subassign_matrix(x, j, value, j_arg, substitute(value))`
CodePudding user response:
If we need to use mutate
use n()
to get the group size. Also, make sure that the mutate
is from dplyr
(as there is also a plyr::mutate
which could mask the function if it is loaded later)
library(dplyr)
subsetdf %>%
group_by(TicketNo) %>%
dplyr::mutate(GroupSize = n())