So I have a main dataframe 'df' that contains date ranges grouped by categories 'name' and 'values'. Values is a subcategory of name.
I have a second dataframe with a list of dates also by category, 'filterdf'. What I need to do is say given data from filterdf$baddates, exclude all dates in gooddates 0 to 2 days before the date in baddates by category.
df <- data.frame (name = c("name_1", "name_1", "name_2", "name_2", "name_2", "name_3", "name_3", "name_3"),
values = c("value_1", "value_1", "value_2","value_4","value_4","value_3","value_3","value_3"),
gooddates = c("2022-02-02","2022-02-03","2022-02-04","2022-02-03","2022-02-04","2022-02-03","2022-02-04","2022-02-06"))
name values gooddates
1 name_1 value_1 2022-02-02
2 name_1 value_1 2022-02-03
3 name_2 value_2 2022-02-04
4 name_2 value_4 2022-02-03
5 name_2 value_4 2022-02-04
6 name_3 value_3 2022-02-03
7 name_3 value_3 2022-02-04
8 name_3 value_3 2022-02-06
filterdf <- data.frame(name = c("name_1", "name_2", "name_3", "name_3"),
baddates = c("2022-02-03","2022-02-03","2022-02-04","2022-02-05"))
name baddates
1 name_1 2022-02-03
2 name_2 2022-02-03
3 name_3 2022-02-04
4 name_3 2022-02-05
Since I need an asymmetrical filter, I can't use the strategy I had hoped (when the dates were both in the original df), which is:
result <- df %>% filter( abs(baddates - gooddates) < 2 )
I need the result to be:
result <- data.frame (name = c( "name_2", "name_2","name_3"),
values = c( "value_2","value_4","value_3"),
gooddates = c("2022-02-04","2022-02-04","2022-02-06"))
name values gooddates
1 name_2 value_2 2022-02-04
2 name_2 value_4 2022-02-04
3 name_3 value_3 2022-02-06
This will be on a larger dataframe where name and values will need to be filtered by as a group on both name and value so I would like a dplyr solution if possible.
CodePudding user response:
Maybe this works with your data set
library(dplyr)
left_join(df, filterdf) %>%
mutate(dif = difftime(baddates, gooddates, units = "days")) %>%
filter(dif < 0 & dif >-2) %>%
select(name:gooddates)
Joining, by = "name"
name values gooddates
1 name_2 value_2 2022-02-04
2 name_2 value_4 2022-02-04
3 name_3 value_3 2022-02-06
CodePudding user response:
With dplyr
, we may do a join_by
library(dplyr)
df %>%
inner_join(filterdf, join_by(name,
closest(gooddates > baddates))) %>%
select(-baddates)
-output
name values gooddates
1 name_2 value_2 2022-02-04
2 name_2 value_4 2022-02-04
3 name_3 value_3 2022-02-06
Or use powerjoin
library(powerjoin)
df$gooddates <- as.Date(df$gooddates)
filterdf$baddates <- as.Date(filterdf$baddates)
power_inner_join(df, filterdf, by = c("name",
~ (.x$gooddates - .y$baddates) >= 1)) %>%
distinct(name, values, gooddates)
-output
name values gooddates
1 name_2 value_2 2022-02-04
2 name_2 value_4 2022-02-04
3 name_3 value_3 2022-02-06