Home > Enterprise >  Aggregating across a multiple choice question data structure
Aggregating across a multiple choice question data structure

Time:09-17

I have a dataset that has a multiple choice question that's been broken out so each option is it's own column. I also have weight column and I'm looking to aggregate across each column of the multiple choice to find the sum of the weight column that selected that option. I know I could do this in a for loop to go across each column, but I was looking for a cleaner method.

Any suggestions?

Dput:

structure(list(`Medical (treatment for any diagnosed medical condition)` = c("0", 
NA, "0", "Medical (treatment for any diagnosed medical condition)", 
"Medical (treatment for any diagnosed medical condition)", "Medical (treatment for any diagnosed medical condition)", 
"Medical (treatment for any diagnosed medical condition)", "0", 
"0", "0"), `Dental (preventive or routine care for oral health)` = c("0", 
NA, "0", "Dental (preventive or routine care for oral health)", 
"0", "Dental (preventive or routine care for oral health)", "Dental (preventive or routine care for oral health)", 
"0", "0", "0"), `Vision (preventive or routine care for eye health)` = c("0", 
NA, "Vision (preventive or routine care for eye health)", "Vision (preventive or routine care for eye health)", 
"0", "Vision (preventive or routine care for eye health)", "0", 
"0", "0", "0"), `Wellness incentives program (such as smoking cessation or weight loss programs)` = c("0", 
NA, "0", "0", "0", "0", "0", "0", "Wellness incentives program (such as smoking cessation or weight loss programs)", 
"Wellness incentives program (such as smoking cessation or weight loss programs)"
), `Gym membership` = c("0", NA, "0", "0", "0", "0", "0", "Gym membership", 
"Gym membership", "Gym membership"), `Life insurance` = c("Life insurance", 
NA, "Life insurance", "Life insurance", "0", "Life insurance", 
"0", "0", "0", "Life insurance"), `Accidental death and dismemberment` = c("0", 
NA, "0", "Accidental death and dismemberment", "0", "Accidental death and dismemberment", 
"0", "0", "0", "0"), `Health savings account (HSA)` = c("0", 
NA, "0", "0", "Health savings account (HSA)", "Health savings account (HSA)", 
"Health savings account (HSA)", "0", "0", "Health savings account (HSA)"
), `Flexible spending account (FSA)` = c("0", NA, "0", "0", "0", 
"0", "0", "0", "0", "0"), `Health reimbursement arrangements (HRA)` = c("Health reimbursement arrangements (HRA)", 
NA, "0", "0", "0", "0", "Health reimbursement arrangements (HRA)", 
"0", "0", "0"), `State medical savings program (MSP)` = c("0", 
NA, "0", "0", "0", "0", "0", "State medical savings program (MSP)", 
"0", "0"), `Pharmacy card` = c("0", NA, "0", "0", "0", "0", "0", 
"0", "0", "0"), `Over-the-counter/supplemental benefits` = c("0", 
NA, "0", "0", "Over-the-counter/supplemental benefits", "Over-the-counter/supplemental benefits", 
"0", "0", "0", "0"), `Other, please specify:` = c("0", NA, "0", 
"0", "0", "0", "0", "0", "0", "0"), weight = c(102099.073110486, 
99927.5001779742, 100569.577453382, 98542.6996469168, 38237.391427588, 
82498.4815047276, 99924.9006748349, 82498.4815047276, 79954.3106465319, 
91795.9294203397)), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))

CodePudding user response:

Perhaps this helps

library(dplyr)
library(purrr)
map(setdiff(names(df1), "weight"),
   ~  df1 %>% 
       select(.x, weight) %>%
       group_by(across(all_of(.x))) %>%
       summarise(weight = sum(weight, na.rm = TRUE)))
  • Related