My dataframe has 2 categorical variables, and one is of a lower hierarchy than the other. I want to sum the numerical value of all rows within the sub-category using dplyr.
Thank you in advance for all those who can help me with!
This is the dataframe I start with:
transportation <- data.frame(
Country = c("A", "A", "A", "B", "B", "B"),
Mode = c("Car", "Train", "Plane", "Car", "Train", "Plane"),
Energy = c(10000, 9000, 20000, 200000, 160000, 450000)
)
And this is the dataframe I want to end up with:
country_sum <- data.frame(
Country = c("A", "A", "A", "B", "B", "B"),
Mode = c("Car", "Train", "Plane", "Car", "Train", "Plane"),
Energy = c(10000, 9000, 20000, 200000, 160000, 450000),
country_sum = c(39000, 39000, 39000, 810000, 810000, 810000)
)
CodePudding user response:
First group by Country
and then mutate
with sum
:
library(dplyr)
transportation %>%
group_by(Country) %>%
mutate(country_sum = sum(Energy))
Country Mode Energy country_sum
<chr> <chr> <dbl> <dbl>
1 A Car 10000 39000
2 A Train 9000 39000
3 A Plane 20000 39000
4 B Car 200000 810000
5 B Train 160000 810000
6 B Plane 450000 810000
CodePudding user response:
Using ave
.
dplyr::mutate(transportation, c_sum=ave(Energy, Country, FUN=sum))
# Country Mode Energy c_sum
# 1 A Car 10000 39000
# 2 A Train 9000 39000
# 3 A Plane 20000 39000
# 4 B Car 200000 810000
# 5 B Train 160000 810000
# 6 B Plane 450000 810000