How to remove multiple rows based on their sum in R?-CodePudding

I have the following data frame df:

LoB    AY DY claims paid
  2  2002  0      1  652
  2  2004  0      1    0
  2  2005  0      1    0
  2  2002  1      0    0
  2  2004  1      0    0
  2  2002  2      0    0
  2  2002  3      0    0

I want to delete rows of the same AY if the sum of their df$paid is 0. Here, there are 2 rows containing AY=2004 and adding their corresponding "paid" values results to 0. Same with AY=2005. I want such rows removed. I am aiming for the following result:

LoB    AY DY claims paid
  2  2002  0      1  652
  2  2002  1      0    0
  2  2002  2      0    0
  2  2002  3      0    0

How do I do this?

CodePudding user response：

You can use subset from base R:

subset(df, ave(paid, AY, FUN = sum) != 0)

Output

  LoB   AY DY claims paid
1   2 2002  0      1  652
4   2 2002  1      0    0
6   2 2002  2      0    0
7   2 2002  3      0    0

Or with data.table:

library(data.table)

setDT(df)[, if (sum(paid) != 0) .SD, by = AY]

Data

df <- structure(list(LoB = c(2L, 2L, 2L, 2L, 2L, 2L, 2L), AY = c(2002L, 
2004L, 2005L, 2002L, 2004L, 2002L, 2002L), DY = c(0L, 0L, 0L, 
1L, 1L, 2L, 3L), claims = c(1L, 1L, 1L, 0L, 0L, 0L, 0L), paid = c(652L, 
0L, 0L, 0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA, 
-7L))

CodePudding user response：

Try the following dplyr solution:

df %>% 
  group_by(AY) %>%
  filter(sum(paid) != 0)

Output:

#     LoB    AY    DY claims  paid
#   <int> <int> <int>  <int> <int>
# 1     2  2002     0      1   652
# 2     2  2002     1      0     0
# 3     2  2002     2      0     0
# 4     2  2002     3      0     0