Home > Software engineering >  How to sum values if the number of rows exceeds the desired number in R
How to sum values if the number of rows exceeds the desired number in R

Time:03-22

Example of mydata

mydata=structure(list(sales_point_id = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), calendar_id_operday = 1:53, 
    line_fact_amt = c(2026319.95, 2509383.87, 2859932.59, 2652536.9, 
    2144475.76, 2844606.9, 3293532.23, 2383631.93, 3053991.07, 
    2591633.34, 2720567.16, 1764697.43, 1128645.17, 1323423.8, 
    1502345.65, 1612565.65, 1313644.68, 1233464.32, 1211636.57, 
    2061583.23, 1090641.31, 1428333.73, 1474186.08, 1607467.17, 
    1180338.86, 1307456.84, 1047247.48, 1657799.59, 1515859.51, 
    2682571.39, 1350357.67, 1513109.28, 1549993.26, 1812758.27, 
    1437857.75, 1620922.46, 2066836.55, 1248856.87, 1107869.92, 
    2884296.49, 1766246.4, 1496881.03, 1166676.17, 1376484.9, 
    1188265.97, 1985668.28, 1094000.85, 1352298.36, 1616352.89, 
    1856824.59, 1726768.06, 1823672, 972326.75)), class = "data.frame", row.names = c(NA, 
-53L))

As we can see sales_point_id has 53 rows. How to do that if for each sales_point_id (this is group variable)count of rows >52, then for variable line_fact_amt sum the values of the rows that come after line 52. In this example

sales_point_id  calendar_id_operday line_fact_amt
2               52                  1823672
2               53                  972326,75 

sum line_fact_amt for 52 and 53 1823672 972326,75=2795998,75, so this value will be put for 52 rows of calendar_id_operday.

So Desired output

sales_point_id  calendar_id_operday line_fact_amt
2   1   2026319.95
2   2   2509383.87
2   3   2859932.59
2   4   2652536.9
2   5   2144475.76
2   6   2844606.9
2   7   3293532.23
2   8   2383631.93
2   9   3053991.07
2   10  2591633.34
2   11  2720567.16
2   12  1764697.43
2   13  1128645.17
2   14  1323423.8
2   15  1502345.65
2   16  1612565.65
2   17  1313644.68
2   18  1233464.32
2   19  1211636.57
2   20  2061583.23
2   21  1090641.31
2   22  1428333.73
2   23  1474186.08
2   24  1607467.17
2   25  1180338.86
2   26  1307456.84
2   27  1047247.48
2   28  1657799.59
2   29  1515859.51
2   30  2682571.39
2   31  1350357.67
2   32  1513109.28
2   33  1549993.26
2   34  1812758.27
2   35  1437857.75
2   36  1620922.46
2   37  2066836.55
2   38  1248856.87
2   39  1107869.92
2   40  2884296.49
2   41  1766246.4
2   42  1496881.03
2   43  1166676.17
2   44  1376484.9
2   45  1188265.97
2   46  1985668.28
2   47  1094000.85
2   48  1352298.36
2   49  1616352.89
2   50  1856824.59
2   51  1726768.06
2   52  2795998,75

for each sales_point_id by all variables must be 52 rows. How to do it? Thank you advance.

CodePudding user response:

Something like this will replace line_fact_amt in all rows starting from 52 with their sum.

g <- seq.int(nrow(mydata))
i <- as.logical(g %/% 52)
mydata[i, "line_fact_amt"] <- sum(mydata[i, "line_fact_amt"])
tail(mydata)
#>    sales_point_id calendar_id_operday line_fact_amt
#> 48              2                  48       1352298
#> 49              2                  49       1616353
#> 50              2                  50       1856825
#> 51              2                  51       1726768
#> 52              2                  52       2795999
#> 53              2                  53       2795999

Created on 2022-03-22 by the reprex package (v2.0.1)

In order to keep only the first 52 rows, do

head(mydata, 52)

CodePudding user response:

Using dplyr's n, which gives the current group size, we can sum values based on index, and append that to the first 51 rows of data.

library(dplyr)

mydata %>%
  group_by(sales_point_id) %>%
  summarise(
    calendar_id_operday = if(n() > 52) 1:52 else calendar_id_operday,
    line_fact_amt = if(n() > 52) {
      c(line_fact_amt[1:51], sum(line_fact_amt[52:n()]))
    } else {
      line_fact_amt
    },
    .groups = "drop",
  )
  • Related