Example of mydata
mydata=structure(list(sales_point_id = c(2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), calendar_id_operday = 1:53,
line_fact_amt = c(2026319.95, 2509383.87, 2859932.59, 2652536.9,
2144475.76, 2844606.9, 3293532.23, 2383631.93, 3053991.07,
2591633.34, 2720567.16, 1764697.43, 1128645.17, 1323423.8,
1502345.65, 1612565.65, 1313644.68, 1233464.32, 1211636.57,
2061583.23, 1090641.31, 1428333.73, 1474186.08, 1607467.17,
1180338.86, 1307456.84, 1047247.48, 1657799.59, 1515859.51,
2682571.39, 1350357.67, 1513109.28, 1549993.26, 1812758.27,
1437857.75, 1620922.46, 2066836.55, 1248856.87, 1107869.92,
2884296.49, 1766246.4, 1496881.03, 1166676.17, 1376484.9,
1188265.97, 1985668.28, 1094000.85, 1352298.36, 1616352.89,
1856824.59, 1726768.06, 1823672, 972326.75)), class = "data.frame", row.names = c(NA,
-53L))
As we can see sales_point_id
has 53 rows. How to do that if for each sales_point_id
(this is group variable)count of rows >52, then for variable line_fact_amt
sum the values of the rows that come after line 52.
In this example
sales_point_id calendar_id_operday line_fact_amt
2 52 1823672
2 53 972326,75
sum line_fact_amt
for 52 and 53 1823672 972326,75=2795998,75
, so this value will be put for 52 rows of calendar_id_operday
.
So Desired output
sales_point_id calendar_id_operday line_fact_amt
2 1 2026319.95
2 2 2509383.87
2 3 2859932.59
2 4 2652536.9
2 5 2144475.76
2 6 2844606.9
2 7 3293532.23
2 8 2383631.93
2 9 3053991.07
2 10 2591633.34
2 11 2720567.16
2 12 1764697.43
2 13 1128645.17
2 14 1323423.8
2 15 1502345.65
2 16 1612565.65
2 17 1313644.68
2 18 1233464.32
2 19 1211636.57
2 20 2061583.23
2 21 1090641.31
2 22 1428333.73
2 23 1474186.08
2 24 1607467.17
2 25 1180338.86
2 26 1307456.84
2 27 1047247.48
2 28 1657799.59
2 29 1515859.51
2 30 2682571.39
2 31 1350357.67
2 32 1513109.28
2 33 1549993.26
2 34 1812758.27
2 35 1437857.75
2 36 1620922.46
2 37 2066836.55
2 38 1248856.87
2 39 1107869.92
2 40 2884296.49
2 41 1766246.4
2 42 1496881.03
2 43 1166676.17
2 44 1376484.9
2 45 1188265.97
2 46 1985668.28
2 47 1094000.85
2 48 1352298.36
2 49 1616352.89
2 50 1856824.59
2 51 1726768.06
2 52 2795998,75
for each sales_point_id
by all variables must be 52 rows.
How to do it?
Thank you advance.
CodePudding user response:
Something like this will replace line_fact_amt
in all rows starting from 52 with their sum.
g <- seq.int(nrow(mydata))
i <- as.logical(g %/% 52)
mydata[i, "line_fact_amt"] <- sum(mydata[i, "line_fact_amt"])
tail(mydata)
#> sales_point_id calendar_id_operday line_fact_amt
#> 48 2 48 1352298
#> 49 2 49 1616353
#> 50 2 50 1856825
#> 51 2 51 1726768
#> 52 2 52 2795999
#> 53 2 53 2795999
Created on 2022-03-22 by the reprex package (v2.0.1)
In order to keep only the first 52 rows, do
head(mydata, 52)
CodePudding user response:
Using dplyr
's n
, which gives the current group size, we can sum values based on index, and append that to the first 51 rows of data.
library(dplyr)
mydata %>%
group_by(sales_point_id) %>%
summarise(
calendar_id_operday = if(n() > 52) 1:52 else calendar_id_operday,
line_fact_amt = if(n() > 52) {
c(line_fact_amt[1:51], sum(line_fact_amt[52:n()]))
} else {
line_fact_amt
},
.groups = "drop",
)