Home > database >  How to perform the equivalent of Excel sumifs in dplyr where there are multiple conditions?
How to perform the equivalent of Excel sumifs in dplyr where there are multiple conditions?

Time:09-06

I get the correct output shown below, with code beneath, in the SumIfs_1 column which calculates the sum of all Code2's in the array for the single condition where all Code1's in the array are < current row Code1:

  Name Group Code1 Code2 SumIfs_1
1    B     1     0     1        1
2    R     1     1     0        2
3    R     1     1     2        2
4    R     2     3     0        4
5    R     2     3     1        4
6    B     3     4     2        5
7    A     3    -1     1        0
8    A     4     0     0        1
9    A     1     0     0        1

Code:

library(dplyr)

myData <- 
  data.frame(
    Name = c("B","R","R","R","R","B","A","A","A"),
    Group = c(1,1,1,2,2,3,3,4,1),
    Code1 = c(0,1,1,3,3,4,-1,0,0),
    Code2 = c(1,0,2,0,1,2,1,0,0)
  )

myData %>% mutate(SumIfs_1 = sapply(1:n(), function(x) sum(Code2[1:n()][Code1[1:n()] < Code1[x]])))

I'd like to expand the code to add another condition to the above sumifs() equivalent, creating a sumifs() with multiple conditions, where we add only those Code 2's for Groups in the array that are < than the current row Group, as further explained in this image (orange shows what already works in the Excel equivalent of the above code for SumIfs_1, yellow shows the sumifs() with more conditions that I am trying to add (SumIfs_2)):

enter image description here

Any recommendations for how to do this?

I'd like to stick with sapply() if possible, and more importantly I'd like to stick with dplyr or base R as I'm trying to prevent package bloat.

For what it's worth, here's my humble attempt to generate the SumIfs_2 column (which does not work):

myData %>% mutate(SumIfs_2 = sapply(1:n(), function(x) sum(Code2[1:n()][Code1[1:n()] < Code1[x]][Group[1:n()] < Group[x]])))

CodePudding user response:

You're doing the same thing pretty much, you just need to add another & condition where you are subsetting.

Also you don't need to call Code1[1:n()], when you call Code1 it already takes all of the values in the Code1 column.

I believe you are looking for

myData %>% mutate(SumIfs_2 = sapply(1:n(), function(x) sum(Code2[(Code1 < Code1[x]) & (Group < Group[x])])))
  Name Group Code1 Code2 SumIfs_2
1    B     1     0     1        0
2    R     1     1     0        0
3    R     1     1     2        0
4    R     2     3     0        3
5    R     2     3     1        3
6    B     3     4     2        4
7    A     3    -1     1        0
8    A     4     0     0        1
9    A     1     0     0        0
  • Related