Hi I am trying to do a pivot sum by group in R. Below my dataframe (my_df)
my_df <- data.frame(Municipality=c('a', 'a', 'a', 'a', 'a', 'a', 'a','a','b','b'),
section=c(1, 1, 1, 1, 1, 1, 2,2,1,1),
state=c('ac', 'ac', 'ac', 'ac', 'ac', 'ac', 'ac','ac','mg','mg'),
gender=c('male', 'male', 'female', 'male', 'female', 'female',
'male','female','female','female'),
age=c('60-64', '17', '18', '18', '21-24', '30-34', '19','40-44','60-64','50-54'),
age_code=c('6064', '1700', '1800', '1800', '2124', '3034',
'1900','4044','6064','5054'),
schooling=c('read_write', 'high school', 'elementary', 'elementary', 'illiterate', 'college', 'elementary incomplete','high school incomplete','illiterate', 'elementary'),
schooling_code=c(2, 4, 3, 3, 1, 5, 3.5, 4.5, 1, 2),
num_voters=c(1, 4, 2, 3, 9, 10, 1, 8, 11, 3))
df_desired <- data.frame(Municipality=c('a', 'a','b'),
section=c(1,2,1),
state=c('ac', 'ac','mg'),
tot_elect=c(27, 9, 14),
tot_male=c(8, 1, 0),
share_male=c(0.29,0.11,0),
share_illiterate=c(0.33, 0, 0.78),
share_60_64=c(0.03, 0, 0.78))
The df_desired is what I am trying to get. I would appreciate your help.
So far, I used
aggregate(df$num_voters, by=list(df$Municipality,df$section,df$state), FUN=sum)
CodePudding user response:
library(dplyr)
my_df %>%
group_by(Municipality, section, state) %>%
summarize(tot_elect = sum(num_voters),
tot_male = sum(num_voters[gender == "male"]),
share_male = tot_male / tot_elect,
share_illit = sum(num_voters[schooling == "illiterate"] / tot_elect),
share_60_64 = sum(num_voters[age == "60-64"] / tot_elect))