Home > Mobile >  Why does rolling sum with map_dbl() work in one case but not the other?
Why does rolling sum with map_dbl() work in one case but not the other?

Time:12-04

I've been trying to understand why the map_dbl() works for one situation but doesn't for another (to my knowledge) almost identical situation. Here is where it does work:

facilities <- structure(list(facilityid = c(11983L, 12937L, 12960L, 12927L, 14201L, 14200L, 13539L, 13543L, 13548L, 115930L, 115931L, 13542L, 13547L, 1243L, 13535L, 13538L, 13553L, 13534L, 13537L, 14047L, 897L, 13544L, 2410L, 62175L, 13540L, 13541L, 898L, 1059L, 13530L, 
13531L), facilitystartdate = structure(c(4183, 4562, 5209, 5457, 5844, 5985, 6209, 6209, 6209, 6240, 6240, 6268, 6322, 6356, 6382, 6390, 6390, 6421, 6452, 6452, 6482, 6502, 6503, 6513, 6515, 6515, 6543, 6543, 6543, 6543), class = "Date"), aux = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)), row.names = c(169L, 201L, 202L, 200L, 348L, 
347L, 261L, 265L, 270L, 3645L, 3646L, 264L, 269L, 4L, 258L, 260L, 272L, 257L, 259L, 346L, 1L, 266L, 7L, 1144L, 262L, 263L, 2L, 3L, 254L, 255L), class = "data.frame")

library(dplyr)
library(lubridate)
library(purrr)

facilities <- facilities %>%
  mutate(facilitystartdate = as.Date(facilitystartdate, format = "%Y-%m-%d")) %>%
  group_by(aux) %>%
  mutate(cum1 = purrr::map_dbl(facilitystartdate, ~ sum(aux[
           facilitystartdate > (. -  months(36)) & facilitystartdate <= .]))) %>% ungroup() 

facilities <- facilities %>%
  select(facilityid, cum1)
colnames(facilities)[2] <- "All_3Y_deals"

These are the first 30 rows of about 24 000.

Next I try to do the same but the column to be summed is for larger numbers and for some reason it does not work. What am I not seeing?

facilities <- structure(list(facilityid = c(11983L, 12937L, 12960L, 12927L, 14201L, 14200L, 13539L, 13543L, 13548L, 115930L, 115931L, 13542L, 13547L, 1243L, 13535L, 13538L, 13553L, 13534L, 13537L, 14047L, 897L, 13544L, 2410L, 62175L, 13540L, 13541L, 898L, 1059L, 13530L, 
13531L), facilitystartdate = structure(c(4183, 4562, 5209, 5457, 5844, 5985, 6209, 6209, 6209, 6240, 6240, 6268, 6322, 6356, 6382, 6390, 6390, 6421, 6452, 6452, 6482, 6502, 6503, 6513, 6515, 6515, 6543, 6543, 6543, 6543), class = "Date"), facility_usd = c(122, 
690, 800, 225, 58, 360, 70, 124.5, 200, 44.899425, 115.455663, 120, 173.4, 50, 32, 68.7, 465, 26.638, 48, 8.245, 50, 143.9, 525, 90, 50, 40, 30, 345, 10, 7.5)), row.names = c(169L, 201L, 
202L, 200L, 348L, 347L, 261L, 265L, 270L, 3645L, 3646L, 264L, 269L, 4L, 258L, 260L, 272L, 257L, 259L, 346L, 1L, 266L, 7L, 1144L, 262L, 263L, 2L, 3L, 254L, 255L), class = "data.frame")

facilities <- facilities %>%
  mutate(facilitystartdate = as.Date(facilitystartdate, format = "%Y-%m-%d")) %>%
  group_by(facility_usd) %>%
  mutate(cum1 = purrr::map_dbl(facilitystartdate, ~ sum(facility_usd[
           facilitystartdate > (. -  months(36)) & facilitystartdate <= .]))) 

facilities <- facilities %>%
  ungroup() %>%
  select(facilityid, cum1)
colnames(facilities)[2] <- "All_3Y_deals_USD"

Where the code now works was what I initially tried to do and Ronak Shah's answer here helped me. But I don't understand why the same solution would not work in this second case. Could someone help me out?

Update: Here is what the first, working one looks like. The All_3Y_deals has the sum of the preceding 3 years of the aux value.

  facilityid facilitystartdate   aux All_3Y_deals
        <int> <date>            <dbl>        <dbl>
 1      11983 1981-06-15            1            1
 2      12937 1982-06-29            1            2
 3      12960 1984-04-06            1            3
 4      12927 1984-12-10            1            3
 5      14201 1986-01-01            1            3
 6      14200 1986-05-22            1            4
 7      13539 1987-01-01            1            7
 8      13543 1987-01-01            1            7
 9      13548 1987-01-01            1            7
10     115930 1987-02-01            1            9
11     115931 1987-02-01            1            9
12      13542 1987-03-01            1           10
13      13547 1987-04-24            1           10
14       1243 1987-05-28            1           11
15      13535 1987-06-23            1           12
16      13538 1987-07-01            1           14
17      13553 1987-07-01            1           14
18      13534 1987-08-01            1           15
19      13537 1987-09-01            1           17
20      14047 1987-09-01            1           17
21        897 1987-10-01            1           18
22      13544 1987-10-21            1           19
23       2410 1987-10-22            1           20
24      62175 1987-11-01            1           21
25      13540 1987-11-03            1           23
26      13541 1987-11-03            1           23
27        898 1987-12-01            1           29
28       1059 1987-12-01            1           29
29      13530 1987-12-01            1           29
30      13531 1987-12-01            1           29

Update: And this is the expected result for the second:

   facilityid facilitystartdate facility_usd All_3Y_deals_USD
      <int> <date>                   <dbl>            <dbl>
 1      11983 1981-06-15              122              122   
 2      12937 1982-06-29              690              812   
 3      12960 1984-04-06              800              1612   
 4      12927 1984-12-10              225              1715   
 5      14201 1986-01-01               58              1083   
 6      14200 1986-05-22              360              2255  
 7      13539 1987-01-01               70              2325  
 8      13543 1987-01-01              124.             2449  
 9      13548 1987-01-01              200              2649  
10     115930 1987-02-01                5.76           2654.76
11     115931 1987-02-01               14.8            2669.56 
12      13542 1987-03-01              120              2789.56  
13      13547 1987-04-24              173.             2162.56
14       1243 1987-05-28               50              3012.56
15      13535 1987-06-23               32              3044.56
16      13538 1987-07-01               68.7            3113.26
17      13553 1987-07-01              465              3578.26
18      13534 1987-08-01               26.6            3604.86
19      13537 1987-09-01               48              3652.86
20      14047 1987-09-01                8.24           3661.1
21        897 1987-10-01               50              3761.1
22      13544 1987-10-21              144.             3905.1  
23       2410 1987-10-22              525              4430.1
24      62175 1987-11-01               90              4520.1
25      13540 1987-11-03               50              4670.1
26      13541 1987-11-03               40              4710.1
27        898 1987-12-01               30              4740.1
28       1059 1987-12-01              345              5085.1  
29      13530 1987-12-01               10              5095.1
30      13531 1987-12-01                7.5            5102.6

I'm trying to get a 3 year rolling sum of facility_usd for the All_3Y_deals_USD column

CodePudding user response:

In the first case you group by aux and get only one group with 30 observations.

In the second case you group by facility_usd and get 28 groups since most values are unique. In particular if you look at the first row in the second case you have facility_usd = 122. There is only one observation like this: therefore you get that `cum1 = 122' because all other rows belong to a different group.

So everything works fine.

  • Related