Groupby and sum last rows with dplyr-CodePudding

Suppose I have the following 'companies' dataset:

Company      Revenue  
Pepsi          10      
Pepsi          20    
Pepsi          30     
Pepsi          10       
Fanta          10       
Fanta          70        
Fanta          90      
Fanta          80     
Fanta          20       
Fanta          40

I want to find the sum of the last two rows for each company, i.e. I want the following output:

Company       Sum 
Pepsi         40      
Fanta         60

where 40 = 20 10, and 60 = 20 40

I try to group by company, but I do not know how to get only the last two rows:

data %>%
group(Company) %>%
summarise(Sum = sum(revenue))

do you have any idea how to solve this?

CodePudding user response：

tail(Revenue, 2) would give you last two values of Revenue.

library(dplyr)

df %>% group_by(Company) %>% summarise(Revenue = sum(tail(Revenue, 2)))

#  Company Revenue
#  <chr>     <int>
#1 Fanta        60
#2 Pepsi        40

You may also use indexing -

df %>% group_by(Company) %>% summarise(Revenue = sum(Revenue[(n()-1):n()]))

The logic can also be implemented in base R and data.table

#Base R
aggregate(Revenue~Company, df, function(x) sum(tail(x, 2)))

#data.table

library(data.table)
setDT(df)[, .(Revenue = sum(tail(Revenue, 2))),Company]

CodePudding user response：

You could also use slice here:

df %>% 
  group_by(Company) %>% 
  slice((n()-1):n()) %>% 
  summarize(Revenue = sum(Revenue))