How to sum values if the number of rows exceeds the desired number in R-CodePudding

Example of mydata

mydata=structure(list(sales_point_id = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), calendar_id_operday = 1:53, 
    line_fact_amt = c(2026319.95, 2509383.87, 2859932.59, 2652536.9, 
    2144475.76, 2844606.9, 3293532.23, 2383631.93, 3053991.07, 
    2591633.34, 2720567.16, 1764697.43, 1128645.17, 1323423.8, 
    1502345.65, 1612565.65, 1313644.68, 1233464.32, 1211636.57, 
    2061583.23, 1090641.31, 1428333.73, 1474186.08, 1607467.17, 
    1180338.86, 1307456.84, 1047247.48, 1657799.59, 1515859.51, 
    2682571.39, 1350357.67, 1513109.28, 1549993.26, 1812758.27, 
    1437857.75, 1620922.46, 2066836.55, 1248856.87, 1107869.92, 
    2884296.49, 1766246.4, 1496881.03, 1166676.17, 1376484.9, 
    1188265.97, 1985668.28, 1094000.85, 1352298.36, 1616352.89, 
    1856824.59, 1726768.06, 1823672, 972326.75)), class = "data.frame", row.names = c(NA, 
-53L))

As we can see sales_point_id has 53 rows. How to do that if for each sales_point_id (this is group variable)count of rows >52, then for variable line_fact_amt sum the values of the rows that come after line 52. In this example

sales_point_id  calendar_id_operday line_fact_amt
2               52                  1823672
2               53                  972326,75

sum line_fact_amt for 52 and 53 1823672 972326,75=2795998,75, so this value will be put for 52 rows of calendar_id_operday.

So Desired output

sales_point_id  calendar_id_operday line_fact_amt
2   1   2026319.95
2   2   2509383.87
2   3   2859932.59
2   4   2652536.9
2   5   2144475.76
2   6   2844606.9
2   7   3293532.23
2   8   2383631.93
2   9   3053991.07
2   10  2591633.34
2   11  2720567.16
2   12  1764697.43
2   13  1128645.17
2   14  1323423.8
2   15  1502345.65
2   16  1612565.65
2   17  1313644.68
2   18  1233464.32
2   19  1211636.57
2   20  2061583.23
2   21  1090641.31
2   22  1428333.73
2   23  1474186.08
2   24  1607467.17
2   25  1180338.86
2   26  1307456.84
2   27  1047247.48
2   28  1657799.59
2   29  1515859.51
2   30  2682571.39
2   31  1350357.67
2   32  1513109.28
2   33  1549993.26
2   34  1812758.27
2   35  1437857.75
2   36  1620922.46
2   37  2066836.55
2   38  1248856.87
2   39  1107869.92
2   40  2884296.49
2   41  1766246.4
2   42  1496881.03
2   43  1166676.17
2   44  1376484.9
2   45  1188265.97
2   46  1985668.28
2   47  1094000.85
2   48  1352298.36
2   49  1616352.89
2   50  1856824.59
2   51  1726768.06
2   52  2795998,75

for each sales_point_id by all variables must be 52 rows. How to do it? Thank you advance.

CodePudding user response：

Something like this will replace line_fact_amt in all rows starting from 52 with their sum.

g <- seq.int(nrow(mydata))
i <- as.logical(g %/% 52)
mydata[i, "line_fact_amt"] <- sum(mydata[i, "line_fact_amt"])
tail(mydata)
#>    sales_point_id calendar_id_operday line_fact_amt
#> 48              2                  48       1352298
#> 49              2                  49       1616353
#> 50              2                  50       1856825
#> 51              2                  51       1726768
#> 52              2                  52       2795999
#> 53              2                  53       2795999

^{Created on 2022-03-22 by the reprex package (v2.0.1)}

In order to keep only the first 52 rows, do

head(mydata, 52)

CodePudding user response：

Using dplyr's n, which gives the current group size, we can sum values based on index, and append that to the first 51 rows of data.

library(dplyr)

mydata %>%
  group_by(sales_point_id) %>%
  summarise(
    calendar_id_operday = if(n() > 52) 1:52 else calendar_id_operday,
    line_fact_amt = if(n() > 52) {
      c(line_fact_amt[1:51], sum(line_fact_amt[52:n()]))
    } else {
      line_fact_amt
    },
    .groups = "drop",
  )