Home > Blockchain >  combinining multiple summarize calls dplyr
combinining multiple summarize calls dplyr

Time:05-20

Given the df

ww <- data.frame(
    GM = c("A", "A", "A", "A", "A", "A",
            "B", "B", "B", "B", "B", "B",
            "C", "C", "C", "C", "C", "C"),
    stanza = rep(c("Past", "Mid", "End"), 6),   
    change = c(1, 1.1, 1.4, 1, 1.3, 1.5, 1, 1.2, 1.4,
               1.1, 1.2, 1.3, .9, 1.2, 1.3, .9, 1.3, 1.5))

I would like to calculate the mean for Past for each GM and the divide each value in 'change' by the GM specific mean. I can do this with two dplyr calls and a join function as follows:

past <- ww %>%
    group_by(GM) %>%
    filter(stanza == "Past") %>%
    summarize(past.mean = mean(change))

ww <- left_join(ww, past, by = "GM")

ww %>%
    group_by(GM, stanza) %>%
    summarize(pr.change = change/past.mean)

But there must be a way to do this in one dplyr call.

CodePudding user response:

No need to join, you can compute this directly in one pipe chain:

ww %>% 
  group_by(GM) %>% 
  mutate(pr.change = change / mean(change[stanza == "Past"])) %>%
  ungroup()

Output

   GM    stanza change pr.change
   <chr> <chr>   <dbl>     <dbl>
 1 A     Past      1       1    
 2 A     Mid       1.1     1.1  
 3 A     End       1.4     1.4  
 4 A     Past      1       1    
 5 A     Mid       1.3     1.3  
 6 A     End       1.5     1.5  
 7 B     Past      1       0.952
 8 B     Mid       1.2     1.14 
 9 B     End       1.4     1.33 
10 B     Past      1.1     1.05 
11 B     Mid       1.2     1.14 
12 B     End       1.3     1.24 
13 C     Past      0.9     1    
14 C     Mid       1.2     1.33 
15 C     End       1.3     1.44 
16 C     Past      0.9     1    
17 C     Mid       1.3     1.44 
18 C     End       1.5     1.67 

CodePudding user response:

Using base R

transform(ww, pr.change = change/ave(replace(change,
    stanza != 'Past', NA), GM, FUN = function(x) mean(x, na.rm = TRUE)))

-output

 GM stanza change pr.change
1   A   Past    1.0  1.000000
2   A    Mid    1.1  1.100000
3   A    End    1.4  1.400000
4   A   Past    1.0  1.000000
5   A    Mid    1.3  1.300000
6   A    End    1.5  1.500000
7   B   Past    1.0  0.952381
8   B    Mid    1.2  1.142857
9   B    End    1.4  1.333333
10  B   Past    1.1  1.047619
11  B    Mid    1.2  1.142857
12  B    End    1.3  1.238095
13  C   Past    0.9  1.000000
14  C    Mid    1.2  1.333333
15  C    End    1.3  1.444444
16  C   Past    0.9  1.000000
17  C    Mid    1.3  1.444444
18  C    End    1.5  1.666667

CodePudding user response:

A data.table solution:

library(data.table)
setDT(ww)
ww[, pr.change := change / mean(change[stanza == "Past"]), GM]
    GM stanza change pr.change
 1:  A   Past    1.0  1.000000
 2:  A    Mid    1.1  1.100000
 3:  A    End    1.4  1.400000
 4:  A   Past    1.0  1.000000
 5:  A    Mid    1.3  1.300000
 6:  A    End    1.5  1.500000
 7:  B   Past    1.0  0.952381
 8:  B    Mid    1.2  1.142857
 9:  B    End    1.4  1.333333
10:  B   Past    1.1  1.047619
11:  B    Mid    1.2  1.142857
12:  B    End    1.3  1.238095
13:  C   Past    0.9  1.000000
14:  C    Mid    1.2  1.333333
15:  C    End    1.3  1.444444
16:  C   Past    0.9  1.000000
17:  C    Mid    1.3  1.444444
18:  C    End    1.5  1.666667
  • Related