Suppose I have the following 'companies' dataset:
Company Revenue
Pepsi 10
Pepsi 20
Pepsi 30
Pepsi 10
Fanta 10
Fanta 70
Fanta 90
Fanta 80
Fanta 20
Fanta 40
I want to find the sum of the last two rows for each company, i.e. I want the following output:
Company Sum
Pepsi 40
Fanta 60
where 40 = 20 10, and 60 = 20 40
I try to group by company, but I do not know how to get only the last two rows:
data %>%
group(Company) %>%
summarise(Sum = sum(revenue))
do you have any idea how to solve this?
CodePudding user response:
tail(Revenue, 2)
would give you last two values of Revenue
.
library(dplyr)
df %>% group_by(Company) %>% summarise(Revenue = sum(tail(Revenue, 2)))
# Company Revenue
# <chr> <int>
#1 Fanta 60
#2 Pepsi 40
You may also use indexing -
df %>% group_by(Company) %>% summarise(Revenue = sum(Revenue[(n()-1):n()]))
The logic can also be implemented in base R and data.table
#Base R
aggregate(Revenue~Company, df, function(x) sum(tail(x, 2)))
#data.table
library(data.table)
setDT(df)[, .(Revenue = sum(tail(Revenue, 2))),Company]
CodePudding user response:
You could also use slice
here:
df %>%
group_by(Company) %>%
slice((n()-1):n()) %>%
summarize(Revenue = sum(Revenue))