Hi I would like to get only 1 unique Code for each rows. To get that 1 uniqe Code the criteria should be get nearest Refresh Date that is >= Effective Date. And if there is no Refresh date that is >= Effective date then just get the nearest Resfresh Date < Effective date. Below is my sample dataframe.
Code <- c("A","A","A", "A", "B", "B", "B", "B", "C","C","C","C")
Effective_Date <- as.Date(c("2020-08-25","2020-08-25","2020-08-25","2020-08-25","2021-12-18","2021-12-18",
"2021-12-18","2021-12-18","2021-10-15","2021-10-15","2021-10-15","2021-10-15"))
Refresh_Date <- as.Date(c("2020-09-25","2021-09-17","2022-11-25","2020-02-20","2021-12-12","2021-12-18",
"2022-01-15","2021-08-19","2021-08-20","2020-08-25","2021-09-30","2020-08-25"))
DF <- data.frame(Code,Effective_Date,Refresh_Date)
> DF
Code Effective_Date Refresh_Date
1 A 2020-08-25 2021-09-17
2 A 2020-08-25 2020-09-25
3 A 2020-08-25 2022-11-25
4 A 2020-08-25 2020-02-20
5 B 2021-12-18 2021-12-14
6 B 2021-12-18 2021-12-18
7 B 2021-12-18 2022-01-15
8 B 2021-12-18 2021-08-19
9 C 2021-10-15 2021-08-20
10 C 2021-10-15 2020-08-25
11 C 2021-10-15 2021-09-30
12 C 2021-10-15 2020-08-25
It's just like aggregating to Code and Effective Date. But get the row that has the nearest Refresh Date >= Effective Date. And if there is no Refresh Date that is >= Effective Date then just get the nearest Refresh Date < Effective Date. Below is my desired output:
> DF_DesiredOutput
Code Effective_Date Refresh_Date
1 A 2020-08-25 2020-09-25
2 B 2021-12-18 2021-12-18
3 C 2021-10-15 2021-09-30
CodePudding user response:
We can use slice
on the difference of 'Refresh_Date' and 'Effective_Date', get the index of the min
value, after grouping by 'Code'
library(dplyr)
DF %>%
group_by(Code) %>%
slice(which.min(abs(Refresh_Date - Effective_Date))) %>%
ungroup
-output
# A tibble: 3 × 3
Code Effective_Date Refresh_Date
<chr> <date> <date>
1 A 2020-08-25 2020-09-25
2 B 2021-12-18 2021-12-18
3 C 2021-10-15 2021-09-30
CodePudding user response:
Here is an alternative approach using arrange
by the absolute difference and then slice
:
library(dplyr)
DF %>%
group_by(Code) %>%
arrange(abs(Refresh_Date-Effective_Date), .by_group = TRUE) %>%
slice(1)
Code Effective_Date Refresh_Date
<chr> <date> <date>
1 A 2020-08-25 2020-09-25
2 B 2021-12-18 2021-12-18
3 C 2021-10-15 2021-09-30