R: Get unique values based on criteria from 2 other columns-CodePudding

Hi I would like to get only 1 unique Code for each rows. To get that 1 uniqe Code the criteria should be get nearest Refresh Date that is >= Effective Date. And if there is no Refresh date that is >= Effective date then just get the nearest Resfresh Date < Effective date. Below is my sample dataframe.

Code <- c("A","A","A", "A", "B", "B", "B", "B", "C","C","C","C")

Effective_Date <- as.Date(c("2020-08-25","2020-08-25","2020-08-25","2020-08-25","2021-12-18","2021-12-18",
                            "2021-12-18","2021-12-18","2021-10-15","2021-10-15","2021-10-15","2021-10-15"))

Refresh_Date <- as.Date(c("2020-09-25","2021-09-17","2022-11-25","2020-02-20","2021-12-12","2021-12-18",
                            "2022-01-15","2021-08-19","2021-08-20","2020-08-25","2021-09-30","2020-08-25"))

DF <- data.frame(Code,Effective_Date,Refresh_Date)
> DF
   Code Effective_Date Refresh_Date
1     A     2020-08-25   2021-09-17
2     A     2020-08-25   2020-09-25
3     A     2020-08-25   2022-11-25
4     A     2020-08-25   2020-02-20
5     B     2021-12-18   2021-12-14
6     B     2021-12-18   2021-12-18
7     B     2021-12-18   2022-01-15
8     B     2021-12-18   2021-08-19
9     C     2021-10-15   2021-08-20
10    C     2021-10-15   2020-08-25
11    C     2021-10-15   2021-09-30
12    C     2021-10-15   2020-08-25

It's just like aggregating to Code and Effective Date. But get the row that has the nearest Refresh Date >= Effective Date. And if there is no Refresh Date that is >= Effective Date then just get the nearest Refresh Date < Effective Date. Below is my desired output:

> DF_DesiredOutput
  Code Effective_Date Refresh_Date
1    A     2020-08-25   2020-09-25
2    B     2021-12-18   2021-12-18
3    C     2021-10-15   2021-09-30

CodePudding user response：

We can use slice on the difference of 'Refresh_Date' and 'Effective_Date', get the index of the min value, after grouping by 'Code'

library(dplyr)
DF %>%
   group_by(Code) %>% 
   slice(which.min(abs(Refresh_Date - Effective_Date))) %>%
   ungroup

-output

# A tibble: 3 × 3
  Code  Effective_Date Refresh_Date
  <chr> <date>         <date>      
1 A     2020-08-25     2020-09-25  
2 B     2021-12-18     2021-12-18  
3 C     2021-10-15     2021-09-30

CodePudding user response：

Here is an alternative approach using arrange by the absolute difference and then slice:

library(dplyr)

DF %>% 
  group_by(Code) %>% 
  arrange(abs(Refresh_Date-Effective_Date), .by_group = TRUE) %>% 
  slice(1)

 Code  Effective_Date Refresh_Date
  <chr> <date>         <date>      
1 A     2020-08-25     2020-09-25  
2 B     2021-12-18     2021-12-18  
3 C     2021-10-15     2021-09-30