I have an almost identical problem with this answered question here: Divide different groups by reference group
I'm having this df, only with more grouping variables(no result column):
df <- data.frame(pop= c(1,1,1,1,1,1,1,1,2,2,2,2,3,3,3,3),
state= c(NJ,NJ,NJ,VT,VT,VT,VT,DC,DC,DC,DC,IL,IL,IL,IL),
start_dt= c(2010-01-01,2010-01-01,2010-01-01,2010-01-01,2010-01-02,2010-01-02,2010-01-02,2010-01-02,2010-02-03,2010-02-03,2010-02-03,2010-02-03,2010-03-05,2010-03-05,2010-03-05,2010-03-05),
end_dt= c(2011-01-01,2011-01-01,2011-01-01,2011-01-01,2011-01-02,2011-01-02,2011-01-02,2011-01-02,2011-02-03,2011-02-03,2011-02-03,2011-02-03,2011-03-05,2011-03-05,2011-03-05,2011-03-05),
value = c(12,7,6,9,15,7,6,9,18,5,6,3,20,5,5,6),
group = c("denominator", "Treated1", "Treated2", "Treated3","denominator", "Treated1", "Treated2", "Treated3","denominator", "Treated1", "Treated2", "Treated3","denominator", "Treated1", "Treated2", "Treated3"),
result = c(1,0.58,0.5,0.75,1,0.46...))
I also want to group the data by all the pop(population), state, start_dt,end_dt,and also by group and then divide each subgroup of group with the denominator of the same grouping above, to get the result column, and I tried with the accepeted answer and did something like:
df <- df %>%
group_by(pop,state,start_dt,end_dt) %>%
mutate(result=value/value[group == "denominator"])
library(dplyr)
df <- df %>%
group_by(pop,state,start_dt,end_dt) %>%
summarize(result = value[group != "denominator"] / value[group == "denominator"])
But I got error:
group_by: 4 grouping variables (pop, state, start_dt, end_dt)
Error in `.fun()`:
! Problem while computing `result=value/value[group == "denominator"]`.
x `result` must be size 1, not 0.
i The error occurred in group 99: pop = "1", group = "Treated2", state =
"DC", start_dt = 2010-01-01, end_dt = 2011-02-01.
Backtrace:
1. ... %>% ...
2. tidylog::mutate(., result=value/value[group == "denominator"])
3. tidylog:::log_mutate(...)
5. dplyr:::mutate.data.frame(.data, ...)
Any ideas?
CodePudding user response:
The issue would be that at least one of the groups didn't have denominator
. We could use [
to subset the first element and coerce it to NA
library(dplyr)
df %>%
group_by(pop,state,start_dt,end_dt) %>%
summarize(result = value[group != "denominator"] /
value[group == "denominator"][1],
group = group[group != "denominator"], .groups = "drop")