Home > Mobile >  Checking sums with condition in R
Checking sums with condition in R

Time:12-18

I need to check the sums in some dataframe. But its elements do not sum up because some of them contain other. In this example 051040 is a part of 051043. Although most elements ends with "0", those which last digit is 3 always bigger or equal to counterparts with last digit 0. In other words I need to find all elements ends with "3" and skip those counterparts with "0". In this example 051040 and 051050 must be skipped because of 051043 and 051053 presence. Note that element "05000" is a control sum, obviously it also must be skipped. So I need to find the difference: 05000 - SUM ("051010", "051043", "051053", "052020", "052100", "052220", "052310") = 0 Something like that. Here is an artificial example (actual dataframe is really big).

Area <- c("050000", "051010", "051040", "051043", "051050", "051053", "052020", "052100", "052220", "052310")
Total <- c(100,  28,  16,  22,  10,  10,  10,  10,  10,  10)
sodf <- data.frame(Area, Total)

Thanks very much in advance!

CodePudding user response:

You could use the substrings.

sodf[with(sodf, ave(substring(Area, 6) == 3, substr(Area, 1, 5), FUN=\(x) {
  if (any(x)) x else TRUE
})), ] |>
  (\(x) x[1, 2] - sum(x[-1, 2]))()
# [1] 0

In an lapply for multiple areas, I expanded slightly your sodf to demonstrate (not exactly sure however, how your area codes exactly look like, but first two digits seem relevant).

lapply(split(sodf, substr(sodf$Area, 1, 2)), \(x) {
  x <- x[ave(substring(Area, 6) == 3, substr(Area, 1, 5), FUN=\(x) {
    if (any(x)) x else TRUE
  }), ]
  x[1, 2] - sum(x[-1, 2])
})
# $`05`
# [1] 0
# 
# $`06`
# [1] 111

Data:

sodf <- structure(list(Area = c("050000", "051010", "051040", "051043", 
"051050", "051053", "052020", "052100", "052220", "052310", "060000", 
"061010", "061040", "061043", "061050", "061053", "062020", "062100", 
"062220", "062310"), Total = c(100, 28, 16, 22, 10, 10, 10, 10, 
10, 10, 211, 28, 16, 22, 10, 10, 10, 10, 10, 10)), row.names = c(NA, 
-20L), class = "data.frame")

CodePudding user response:

One row solution can be this:

sum(sodf[!sodf$Area %in% stringr::str_replace_all(sodf[!grepl('0$',sodf$Area),"Area"],'[1-9]$','0') & !sodf$Area %in% sodf[grepl('^[0] [1-9]{1}[0] $',sodf$Area),"Area"] ,"Total"])

How does it work ?

  1. You find rows with Area that does not end with 0. sodf[!grepl('0$',sodf$Area),"Area"] # "051043" "051053"
  2. You find how this row should end, if they had a 0. I used a function from stringr package. stringr::str_replace_all(sodf[!grepl('0$',sodf$Area),"Area"],'[1-9]$','0') # "051040" "051050"
  3. You find rows that are not included in group 2 and does not have '050000' structure (^[0] [1-9]{1}[0] $), then sum their total. sum(sodf[!sodf$Area %in% stringr::str_replace_all(sodf[!grepl('0$',sodf$Area),"Area"],'[1-9]$','0') & !sodf$Area %in% sodf[grepl('^[0] [1-9]{1}[0] $',sodf$Area),"Area"] ,"Total"] ) # 100

Same result can be achived splitting the code in multiple lines with dplyr package.

  •  Tags:  
  • r
  • Related