Home > OS >  Divide different groups by reference group
Divide different groups by reference group

Time:12-24

I have an almost identical problem with this answered question here: Divide different groups by reference group

I'm having this df, only with more grouping variables(no result column):

df <- data.frame(pop= c(1,1,1,1,1,1,1,1,2,2,2,2,3,3,3,3),
                 state= c(NJ,NJ,NJ,VT,VT,VT,VT,DC,DC,DC,DC,IL,IL,IL,IL),
                 start_dt= c(2010-01-01,2010-01-01,2010-01-01,2010-01-01,2010-01-02,2010-01-02,2010-01-02,2010-01-02,2010-02-03,2010-02-03,2010-02-03,2010-02-03,2010-03-05,2010-03-05,2010-03-05,2010-03-05),
                 end_dt= c(2011-01-01,2011-01-01,2011-01-01,2011-01-01,2011-01-02,2011-01-02,2011-01-02,2011-01-02,2011-02-03,2011-02-03,2011-02-03,2011-02-03,2011-03-05,2011-03-05,2011-03-05,2011-03-05),
                 value = c(12,7,6,9,15,7,6,9,18,5,6,3,20,5,5,6),
                 group = c("denominator", "Treated1", "Treated2", "Treated3","denominator", "Treated1", "Treated2", "Treated3","denominator", "Treated1", "Treated2", "Treated3","denominator", "Treated1", "Treated2", "Treated3"),
                 result = c(1,0.58,0.5,0.75,1,0.46...))

I also want to group the data by all the pop(population), state, start_dt,end_dt,and also by group and then divide each subgroup of group with the denominator of the same grouping above, to get the result column, and I tried with the accepeted answer and did something like:

df <- df %>% 
  group_by(pop,state,start_dt,end_dt) %>% 
  mutate(result=value/value[group == "denominator"])

library(dplyr)
df <- df %>%
   group_by(pop,state,start_dt,end_dt) %>%
   summarize(result = value[group != "denominator"] / value[group == "denominator"])

But I got error:

group_by: 4 grouping variables (pop, state, start_dt, end_dt)
Error in `.fun()`:
! Problem while computing `result=value/value[group == "denominator"]`.
x `result` must be size 1, not 0.
i The error occurred in group 99: pop = "1", group = "Treated2", state =
  "DC", start_dt = 2010-01-01, end_dt = 2011-02-01.
Backtrace:
 1. ... %>% ...
 2. tidylog::mutate(., result=value/value[group == "denominator"])
 3. tidylog:::log_mutate(...)
 5. dplyr:::mutate.data.frame(.data, ...)

Any ideas?

CodePudding user response:

The issue would be that at least one of the groups didn't have denominator. We could use [ to subset the first element and coerce it to NA

library(dplyr)
df %>%
   group_by(pop,state,start_dt,end_dt) %>%
   summarize(result = value[group != "denominator"] / 
          value[group == "denominator"][1],
       group = group[group != "denominator"], .groups = "drop")
  •  Tags:  
  • r
  • Related