I have the following data frame df:
LoB AY DY claims paid
2 2002 0 1 652
2 2004 0 1 0
2 2005 0 1 0
2 2002 1 0 0
2 2004 1 0 0
2 2002 2 0 0
2 2002 3 0 0
I want to delete rows of the same AY if the sum of their df$paid is 0. Here, there are 2 rows containing AY=2004 and adding their corresponding "paid" values results to 0. Same with AY=2005. I want such rows removed. I am aiming for the following result:
LoB AY DY claims paid
2 2002 0 1 652
2 2002 1 0 0
2 2002 2 0 0
2 2002 3 0 0
How do I do this?
CodePudding user response:
You can use subset
from base R:
subset(df, ave(paid, AY, FUN = sum) != 0)
Output
LoB AY DY claims paid
1 2 2002 0 1 652
4 2 2002 1 0 0
6 2 2002 2 0 0
7 2 2002 3 0 0
Or with data.table
:
library(data.table)
setDT(df)[, if (sum(paid) != 0) .SD, by = AY]
Data
df <- structure(list(LoB = c(2L, 2L, 2L, 2L, 2L, 2L, 2L), AY = c(2002L,
2004L, 2005L, 2002L, 2004L, 2002L, 2002L), DY = c(0L, 0L, 0L,
1L, 1L, 2L, 3L), claims = c(1L, 1L, 1L, 0L, 0L, 0L, 0L), paid = c(652L,
0L, 0L, 0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA,
-7L))
CodePudding user response:
Try the following dplyr
solution:
df %>%
group_by(AY) %>%
filter(sum(paid) != 0)
Output:
# LoB AY DY claims paid
# <int> <int> <int> <int> <int>
# 1 2 2002 0 1 652
# 2 2 2002 1 0 0
# 3 2 2002 2 0 0
# 4 2 2002 3 0 0