I have a dataframe in the following structure:
structure(list(SUB_DISTRICT_CODE = c(90101L, 90101L, 90101L,
90101L, 90101L, 90101L, 90102L, 90102L, 90102L, 90102L, 90102L,
90102L, 90103L, 90103L, 90103L, 90103L, 90103L, 90103L), SEX = c(1L,
1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L,
2L), AGR3 = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L), TOTAL_per_GROUP = c(184L, 1245L, 456L,
167L, 1216L, 567L, 91L, 463L, 150L, 96L, 476L, 217L, 118L, 618L,
256L, 116L, 627L, 293L)), row.names = 21295:21312, class = "data.frame")
At the moment there are 6 entries for every SUB_DISTRICT CODE. In the final dataframe there should only be 3 entries for every SUB_DISTRICT_CODE (one for each unique value of AGR3). The column SEX should be dropped and the TOTAL_per_GROUP values be summarized by the AGR3 column. How can I do this in a easy way (using dplyr)? Thanks
CodePudding user response:
Try using group_by()
for AGR3
and SUB_DISTRICT_CODE
and then summarise()
:
df %>% group_by(AGR3, SUB_DISTRICT_CODE) %>%
summarise(sum = sum(TOTAL_per_GROUP))
Output:
# AGR3 SUB_DISTRICT_CODE sum
# <int> <int> <int>
# 1 1 90101 351
# 2 1 90102 187
# 3 1 90103 234
# 4 2 90101 2461
# 5 2 90102 939
# 6 2 90103 1245
# 7 3 90101 1023
# 8 3 90102 367
# 9 3 90103 549
CodePudding user response:
# Base R solution: base_r_res => data.frame
base_r_res <- aggregate(
TOTAL_per_GROUP ~ AGR3 SUB_DISTRICT_CODE,
data = df,
FUN = sum
)
# Send result to console: data.frame => stdout(console)
base_r_res
# Data table solution: import library
library(data.table)
# Aggregate by group: dt_res => data.table
dt_res = setDT(df)[,.(TOTAL_per_GROUP=sum(TOTAL_per_GROUP)), by = list(AGR3, SUB_DISTRICT_CODE)]
# Send result to console: data.table => stdout(console)
dt_res