I've been trying to understand why the map_dbl()
works for one situation but doesn't for another (to my knowledge) almost identical situation. Here is where it does work:
facilities <- structure(list(facilityid = c(11983L, 12937L, 12960L, 12927L, 14201L, 14200L, 13539L, 13543L, 13548L, 115930L, 115931L, 13542L, 13547L, 1243L, 13535L, 13538L, 13553L, 13534L, 13537L, 14047L, 897L, 13544L, 2410L, 62175L, 13540L, 13541L, 898L, 1059L, 13530L,
13531L), facilitystartdate = structure(c(4183, 4562, 5209, 5457, 5844, 5985, 6209, 6209, 6209, 6240, 6240, 6268, 6322, 6356, 6382, 6390, 6390, 6421, 6452, 6452, 6482, 6502, 6503, 6513, 6515, 6515, 6543, 6543, 6543, 6543), class = "Date"), aux = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)), row.names = c(169L, 201L, 202L, 200L, 348L,
347L, 261L, 265L, 270L, 3645L, 3646L, 264L, 269L, 4L, 258L, 260L, 272L, 257L, 259L, 346L, 1L, 266L, 7L, 1144L, 262L, 263L, 2L, 3L, 254L, 255L), class = "data.frame")
library(dplyr)
library(lubridate)
library(purrr)
facilities <- facilities %>%
mutate(facilitystartdate = as.Date(facilitystartdate, format = "%Y-%m-%d")) %>%
group_by(aux) %>%
mutate(cum1 = purrr::map_dbl(facilitystartdate, ~ sum(aux[
facilitystartdate > (. - months(36)) & facilitystartdate <= .]))) %>% ungroup()
facilities <- facilities %>%
select(facilityid, cum1)
colnames(facilities)[2] <- "All_3Y_deals"
These are the first 30 rows of about 24 000.
Next I try to do the same but the column to be summed is for larger numbers and for some reason it does not work. What am I not seeing?
facilities <- structure(list(facilityid = c(11983L, 12937L, 12960L, 12927L, 14201L, 14200L, 13539L, 13543L, 13548L, 115930L, 115931L, 13542L, 13547L, 1243L, 13535L, 13538L, 13553L, 13534L, 13537L, 14047L, 897L, 13544L, 2410L, 62175L, 13540L, 13541L, 898L, 1059L, 13530L,
13531L), facilitystartdate = structure(c(4183, 4562, 5209, 5457, 5844, 5985, 6209, 6209, 6209, 6240, 6240, 6268, 6322, 6356, 6382, 6390, 6390, 6421, 6452, 6452, 6482, 6502, 6503, 6513, 6515, 6515, 6543, 6543, 6543, 6543), class = "Date"), facility_usd = c(122,
690, 800, 225, 58, 360, 70, 124.5, 200, 44.899425, 115.455663, 120, 173.4, 50, 32, 68.7, 465, 26.638, 48, 8.245, 50, 143.9, 525, 90, 50, 40, 30, 345, 10, 7.5)), row.names = c(169L, 201L,
202L, 200L, 348L, 347L, 261L, 265L, 270L, 3645L, 3646L, 264L, 269L, 4L, 258L, 260L, 272L, 257L, 259L, 346L, 1L, 266L, 7L, 1144L, 262L, 263L, 2L, 3L, 254L, 255L), class = "data.frame")
facilities <- facilities %>%
mutate(facilitystartdate = as.Date(facilitystartdate, format = "%Y-%m-%d")) %>%
group_by(facility_usd) %>%
mutate(cum1 = purrr::map_dbl(facilitystartdate, ~ sum(facility_usd[
facilitystartdate > (. - months(36)) & facilitystartdate <= .])))
facilities <- facilities %>%
ungroup() %>%
select(facilityid, cum1)
colnames(facilities)[2] <- "All_3Y_deals_USD"
Where the code now works was what I initially tried to do and Ronak Shah's answer here helped me. But I don't understand why the same solution would not work in this second case. Could someone help me out?
Update:
Here is what the first, working one looks like. The All_3Y_deals has the sum of the preceding 3 years of the aux
value.
facilityid facilitystartdate aux All_3Y_deals
<int> <date> <dbl> <dbl>
1 11983 1981-06-15 1 1
2 12937 1982-06-29 1 2
3 12960 1984-04-06 1 3
4 12927 1984-12-10 1 3
5 14201 1986-01-01 1 3
6 14200 1986-05-22 1 4
7 13539 1987-01-01 1 7
8 13543 1987-01-01 1 7
9 13548 1987-01-01 1 7
10 115930 1987-02-01 1 9
11 115931 1987-02-01 1 9
12 13542 1987-03-01 1 10
13 13547 1987-04-24 1 10
14 1243 1987-05-28 1 11
15 13535 1987-06-23 1 12
16 13538 1987-07-01 1 14
17 13553 1987-07-01 1 14
18 13534 1987-08-01 1 15
19 13537 1987-09-01 1 17
20 14047 1987-09-01 1 17
21 897 1987-10-01 1 18
22 13544 1987-10-21 1 19
23 2410 1987-10-22 1 20
24 62175 1987-11-01 1 21
25 13540 1987-11-03 1 23
26 13541 1987-11-03 1 23
27 898 1987-12-01 1 29
28 1059 1987-12-01 1 29
29 13530 1987-12-01 1 29
30 13531 1987-12-01 1 29
Update: And this is the expected result for the second:
facilityid facilitystartdate facility_usd All_3Y_deals_USD
<int> <date> <dbl> <dbl>
1 11983 1981-06-15 122 122
2 12937 1982-06-29 690 812
3 12960 1984-04-06 800 1612
4 12927 1984-12-10 225 1715
5 14201 1986-01-01 58 1083
6 14200 1986-05-22 360 2255
7 13539 1987-01-01 70 2325
8 13543 1987-01-01 124. 2449
9 13548 1987-01-01 200 2649
10 115930 1987-02-01 5.76 2654.76
11 115931 1987-02-01 14.8 2669.56
12 13542 1987-03-01 120 2789.56
13 13547 1987-04-24 173. 2162.56
14 1243 1987-05-28 50 3012.56
15 13535 1987-06-23 32 3044.56
16 13538 1987-07-01 68.7 3113.26
17 13553 1987-07-01 465 3578.26
18 13534 1987-08-01 26.6 3604.86
19 13537 1987-09-01 48 3652.86
20 14047 1987-09-01 8.24 3661.1
21 897 1987-10-01 50 3761.1
22 13544 1987-10-21 144. 3905.1
23 2410 1987-10-22 525 4430.1
24 62175 1987-11-01 90 4520.1
25 13540 1987-11-03 50 4670.1
26 13541 1987-11-03 40 4710.1
27 898 1987-12-01 30 4740.1
28 1059 1987-12-01 345 5085.1
29 13530 1987-12-01 10 5095.1
30 13531 1987-12-01 7.5 5102.6
I'm trying to get a 3 year rolling sum of facility_usd for the All_3Y_deals_USD column
CodePudding user response:
In the first case you group by aux
and get only one group with 30 observations.
In the second case you group by facility_usd
and get 28 groups since most values are unique.
In particular if you look at the first row in the second case you have facility_usd = 122
. There is only one observation like this: therefore you get that `cum1 = 122' because all other rows belong to a different group.
So everything works fine.