I have a dataset looking like this
temp <- structure(list(Date = c("23/06/2002", "24/06/2002", "25/06/2002",
"25/06/2002", "26/06/2002", "02/07/2002", "03/07/2002", "24/07/2002",
"15/07/2002", "17/07/2002", "22/07/2002"), payment = c(-1000,
1000, -1000, -1000, 1000, -1000, -1000, -1000, 1200, 1200, 200
), Code = c("M567", "M567", "M567", "M567", "XYZ", "M567",
"ABX", "M567", "M567", "M567", "M300"), ID = c("187", "98",
"187", "187", "12ee", NA, NA, NA, "111", "111", "11")), class = c("data.table",
"data.frame"), row.names = c(NA, -11L), groups = structure(list(
assignment = c("ABX", "M300", "M567", "XYZ"), .rows = structure(list(
7L, 11L, c(1L, 2L, 3L, 4L, 6L, 8L, 9L, 10L), 5L), ptype = integer(0), class =
c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -4L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), .internal.selfref = <pointer: 0x0000020274961ef0>)
Input:
Date payment Code ID
1: 23/06/2002 -1000 M567 187
2: 24/06/2002 1000 M567 98
3: 25/06/2002 -1000 M567 187
4: 25/06/2002 -1000 M567 187
5: 26/06/2002 1000 XYZ 12ee
6: 02/07/2002 -1000 M567 <NA>
7: 03/07/2002 -1000 ABX <NA>
8: 24/07/2002 -1000 M567 <NA>
9: 15/07/2002 1200 M567 111
10: 17/07/2002 1200 M567 111
11: 22/07/2002 200 M300 11
I want to delete row that would give sum = 0 for a same payment and Code Column (No need to consider ID Column)
For Example: 1000 and M567 should cancel out with any -1000 and M567 remaining rows should be the same
- Its just that I have to cancel the pair of ve and -ve from the dataset
Expected Output:
Date payment Code ID
1: 25/06/2002 -1000 M567 187
2: 25/06/2002 -1000 M567 187
3: 26/06/2002 1000 XYZ 12ee
4: 02/07/2002 -1000 M567 <NA>
5: 03/07/2002 -1000 ABX <NA>
6: 24/07/2002 -1000 M567 <NA>
7: 15/07/2002 1200 M567 111
8: 17/07/2002 1200 M567 111
9: 22/07/2002 200 M300 11
CodePudding user response:
Here is one possible way:
library(dplyr)
temp %>%
group_by(group = as.integer(gl(n(), 2,n()))) %>%
mutate(x = sum(payment)) %>%
filter(!(x == 0 & first(Code) == last(Code))) %>%
ungroup() %>%
select(-x, -group)
Date payment Code ID
<chr> <dbl> <chr> <chr>
1 25/06/2002 -1000 M567 187
2 25/06/2002 -1000 M567 187
3 26/06/2002 1000 XYZ 12ee
4 02/07/2002 -1000 M567 NA
5 03/07/2002 -1000 ABX NA
6 24/07/2002 -1000 M567 NA
7 15/07/2002 1200 M567 111
8 17/07/2002 1200 M567 111
9 22/07/2002 200 M300 11
CodePudding user response:
You can figure out the codes for which the sum of payments is zero, put them in a vector and remove all rows from the original data.frame that contain this code:
totalpay = aggregate(temp$payment ~ temp$Code, FUN=sum)
zeropay = totalpay[totalpay[,2]==0,1]
temp = temp[!temp$Code %in% zeropay,]
As a bonus, you can retain the vector zeropay
as a documentation of removed codes.