Home > Software engineering >  Select rows where dates in n columns match
Select rows where dates in n columns match

Time:10-27

Suppose I have a dataset:

Name <- c("Jon", "Bill", "Maria", "Ben", "Tina")
Age <- c(23, 41, 32, 58, 26)
datesurg<-c("2022-03-17","2022-02-20","2022-01-23","2022-03-18","2022-04-17")
dateevent<-c("2022-03-20","2022-02-21","2022-03-17","2022-03-18","2022-04-17")
datelookup<-c("2022-03-20","2022-02-20","2022-01-23","2022-03-18","2022-04-17")

df <- data.frame(Name, Age,datesurg,dateevent,datelookup)

I want to have a dataframe where for each row of "datesurg" if this variable matches any date in the "datelookup" column, keep the row.

   Name Age   datesurg  dateevent datelookup
2  Bill  41 2022-02-20 2022-02-21 2022-02-20
3 Maria  32 2022-01-23 2022-03-17 2022-01-23
4   Ben  58 2022-03-18 2022-03-18 2022-03-18
5  Tina  26 2022-04-17 2022-04-17 2022-04-17

Next I want to keep the rows where "datesurg" or "dateevent" are equal to "datelookup".

   Name Age   datesurg  dateevent datelookup
1   Jon  23 2022-03-17 2022-03-20 2022-03-20
2  Bill  41 2022-02-20 2022-02-21 2022-02-20
3 Maria  32 2022-01-23 2022-03-17 2022-01-23
4   Ben  58 2022-03-18 2022-03-18 2022-03-18
5  Tina  26 2022-04-17 2022-04-17 2022-04-17

CodePudding user response:

We can use if_any

library(dplyr)
library(lubridate)
df %>%
   mutate(across(starts_with('date'), ymd)) %>% 
   filter(if_any(dateevent:datelookup, ~ datesurg == .x))

-output

   Name Age   datesurg  dateevent datelookup
1  Bill  41 2022-02-20 2022-02-21 2022-02-20
2 Maria  32 2022-01-23 2022-03-17 2022-01-23
3   Ben  58 2022-03-18 2022-03-18 2022-03-18
4  Tina  26 2022-04-17 2022-04-17 2022-04-17

or for the second case

df %>% 
  mutate(across(starts_with('date'), ymd)) %>% 
  filter(if_any(c(datesurg, dateevent), ~ datelookup == .x))

-output

  Name Age   datesurg  dateevent datelookup
1   Jon  23 2022-03-17 2022-03-20 2022-03-20
2  Bill  41 2022-02-20 2022-02-21 2022-02-20
3 Maria  32 2022-01-23 2022-03-17 2022-01-23
4   Ben  58 2022-03-18 2022-03-18 2022-03-18
5  Tina  26 2022-04-17 2022-04-17 2022-04-17

CodePudding user response:

Maybe try this with base R

"datesurg" in "datelookup"

df[with(df, datesurg %in% datelookup), ]
   Name Age   datesurg  dateevent datelookup
2  Bill  41 2022-02-20 2022-02-21 2022-02-20
3 Maria  32 2022-01-23 2022-03-17 2022-01-23
4   Ben  58 2022-03-18 2022-03-18 2022-03-18
5  Tina  26 2022-04-17 2022-04-17 2022-04-17

"datesurg" or "dateevent" in "datelookup"

df[with(df, datesurg %in% datelookup | dateevent %in% datelookup), ]
   Name Age   datesurg  dateevent datelookup
1   Jon  23 2022-03-17 2022-03-20 2022-03-20
2  Bill  41 2022-02-20 2022-02-21 2022-02-20
3 Maria  32 2022-01-23 2022-03-17 2022-01-23
4   Ben  58 2022-03-18 2022-03-18 2022-03-18
5  Tina  26 2022-04-17 2022-04-17 2022-04-17

CodePudding user response:

We could do it this way:

library(dplyr)
#1
df_surg <- df %>% 
  filter(datesurg %in% datelookup)
#2
df_surg_event <- df %>% 
  filter(datesurg == datelookup | dateevent == datelookup)
  • Related