Home > Blockchain >  How to sum rows based on group and sub-group using dplyr in R?
How to sum rows based on group and sub-group using dplyr in R?

Time:04-20

My dataframe has 2 categorical variables, and one is of a lower hierarchy than the other. I want to sum the numerical value of all rows within the sub-category using dplyr.

Thank you in advance for all those who can help me with!

This is the dataframe I start with:

transportation <- data.frame(
  Country = c("A", "A", "A", "B", "B", "B"),
  Mode = c("Car", "Train", "Plane", "Car", "Train", "Plane"),
  Energy = c(10000, 9000, 20000, 200000, 160000, 450000)
)

Starting Dataframe

And this is the dataframe I want to end up with:

country_sum <- data.frame(
  Country = c("A", "A", "A", "B", "B", "B"),
  Mode = c("Car", "Train", "Plane", "Car", "Train", "Plane"),
  Energy = c(10000, 9000, 20000, 200000, 160000, 450000),
  country_sum = c(39000, 39000, 39000, 810000, 810000, 810000)
)

Final Dataframe

CodePudding user response:

First group by Country and then mutate with sum:

library(dplyr)

transportation %>% 
  group_by(Country) %>% 
  mutate(country_sum = sum(Energy))

 Country Mode  Energy country_sum
  <chr>   <chr>  <dbl>       <dbl>
1 A       Car    10000       39000
2 A       Train   9000       39000
3 A       Plane  20000       39000
4 B       Car   200000      810000
5 B       Train 160000      810000
6 B       Plane 450000      810000

CodePudding user response:

Using ave.

dplyr::mutate(transportation, c_sum=ave(Energy, Country, FUN=sum))
#   Country  Mode Energy  c_sum
# 1       A   Car  10000  39000
# 2       A Train   9000  39000
# 3       A Plane  20000  39000
# 4       B   Car 200000 810000
# 5       B Train 160000 810000
# 6       B Plane 450000 810000
  • Related