I have a dataframe with revenue numbers for different clients in different years. I want to calculate the total revenue for a client in a year. Then I want to calculate the mean of yearly revenue for the 3 highest grossing years.
I manage to do this with group_by and summarize. However in this way I "lose" all variables not used in grouping or calculation functions. I changed summarize to mutate but for one reason or another, this gives me different results.
Does anyone know why this happpens, or how to get same result in a different manner?
MASTERDATA%>%
group_by(Client,Year)%>%
summarise(Revenue=sum(Total, na.rm=TRUE))%>%
slice_max(n=3,order_by=Revenue,with_ties = FALSE )%>%
mutate(Revenue_top_3 = mean(Revenue))%>%
select(Client,Year,Revenue)%>%
ungroup()%>%
distinct()
Applying this to the mtcars dataset
mtcars%>%
group_by(gear,carb)%>%
mutate(Revenue=sum(mpg, na.rm=TRUE))%>%
slice_max(n=3,order_by=Revenue,with_ties = FALSE )%>%
mutate(Revenue_top_3 = mean(Revenue))%>%
select(gear,carb,Revenue)%>%
ungroup()%>%
distinct()
CodePudding user response:
In your attempt you are not taking mean
of 3 highest grossing years.
Here are two approaches -
library(dplyr)
mtcars%>%
group_by(gear,carb)%>%
summarise(Revenue=sum(mpg, na.rm=TRUE)) %>%
slice_max(n=3,order_by=Revenue,with_ties = FALSE) %>%
summarise(Revenue_top_3 = mean(Revenue))
# gear Revenue_top_3
# <dbl> <dbl>
#1 3 64.2
#2 4 98.1
#3 5 30.6
And with same output but one less step.
mtcars%>%
group_by(gear,carb)%>%
summarise(Revenue=sum(mpg, na.rm=TRUE)) %>%
summarise(Revenue_top_3 = Revenue %>% sort %>% tail(3) %>% mean())