Home > database >  Subset entries in each group based on dates and groups stored in another dataframe
Subset entries in each group based on dates and groups stored in another dataframe

Time:01-19

I want to remove rows in dataframe1 with a date that occurs before the date in dataframe2, based on the specific course.

Dataframe1 that looks like this:

Participant ID Course Date
1 BIO 110 2022-02-03
2 BIO 110 2022-02-01
3 CHEM 110 2022-03-15

Dataframe 2 looks like this:

Course Date
BIO 110 2022-02-03
CHEM 110 2022-03-15

The desired output:

Participant ID Course Date
1 BIO 110 2022-02-03
3 CHEM 110 2022-03-15

CodePudding user response:

You may try the below code

df <- df1 %>% left_join(df2, by='Course') %>% filter(Date.x>=Date.y)

CodePudding user response:

You will need to ensure that the dates are recognized as dates before filtering.

library(tidyverse)

df_1 <- tibble::tribble(
  ~Participant.ID,    ~Course,        ~Date,
                1L,  "BIO 110", "2022-02-03",
                2L,  "BIO 110", "2022-02-01",
                3L, "CHEM 110", "2022-03-15"
)

df_2 <- tibble::tribble(
     ~Course,        ~Date,
   "BIO 110", "2022-02-03",
  "CHEM 110", "2022-03-15"
) %>% 
  dplyr::rename(Min_Date = Date)

dplyr::left_join(df_1, df_2, by = "Course") %>% 
  dplyr::mutate(Date = lubridate::as_date(Date),
                Min_Date = lubridate::as_date(Min_Date)) %>% 
  dplyr::filter(Date >= Min_Date)

CodePudding user response:

Using purrr we can do:

library(tidyverse)

map2_dfr(df2$Course, df2$Date,
          ~ df1 %>% filter(Course == .x, Date >= .y))

## # A tibble: 2 x 3
##   Participant.ID Course   Date      
##            <int> <chr>    <chr>     
## 1              1 BIO 110  2022-02-03
## 2              3 CHEM 110 2022-03-15
  • Related