Home > Enterprise >  filter DataFrame to exclude specific dates
filter DataFrame to exclude specific dates

Time:11-24

I want to know how to filter a DataFrame to exclude specific and discrete dates.

# Input Data
dates = c("2021-03-31", "2021-05-02", "2021-06-30", "2021-10-22")
dates = as.Date(dates)

x = structure(list(Gender = c("Male", "Female", "Male", "Male", "Female", 
"Male", "Female", "Female", "Female", "Female", "Female", "Female", 
"Female", "Female", "Female", "Female", "Male", "Male", "Female", 
"Female", "Female", "Male", "Female", "Female", "Male", "Female", 
"Male", "Female", "Female", "Female", "Male", "Male", "Female", 
"Female", "Female", "Female", "Female", "Female", "Female", "Male", 
"Female", "Male", "Female", "Male", "Female", "Female", "Female", 
"Male", "Male", "Female", "Female", "Female", "Male", "Male", 
"Female", "Female", "Female", "Male", "Female", "Male", "Female", 
"Male", "Female", "Female", "Female", "Female", "Female", "Female", 
"Female", "Female", "Male", "Female", "Female", "Female", "Female", 
"Female", "Male", "Male", "Female", "Male", "Female", "Female", 
"Male", "Female", "Female", "Female", "Female", "Female", "Male", 
"Female", "Female", "Male", "Female", "Female", "Female", "Female", 
"Female"), `Termination Date` = c("2021-01-05", "2021-02-12", 
"2021-02-22", "2021-02-24", "2021-03-12", "2021-03-12", "2021-03-24", 
"2021-03-26", "2021-03-31", "2021-03-31", "2021-03-31", "2021-03-31", 
"2021-03-31", "2021-03-31", "2021-03-31", "2021-03-31", "2021-03-31", 
"2021-04-02", "2021-04-02", "2021-04-05", "2021-04-09", "2021-04-30", 
"2021-05-05", "2021-05-11", "2021-05-11", "2021-05-14", "2021-05-21", 
"2021-05-21", "2021-05-24", "2021-06-01", "2021-06-11", "2021-06-11", 
"2021-06-14", "2021-06-24", "2021-06-27", "2021-06-27", "2021-06-27", 
"2021-06-27", "2021-07-02", "2021-07-07", "2021-07-23", "2021-07-26", 
"2021-07-26", "2021-07-27", "2021-07-30", "2021-08-02", "2021-08-06", 
"2021-08-06", "2021-08-09", "2021-08-11", "2021-08-13", "2021-08-13", 
"2021-08-13", "2021-08-13", "2021-08-16", "2021-08-18", "2021-08-20", 
"2021-08-23", "2021-08-24", "2021-08-25", "2021-08-27", "2021-08-27", 
"2021-08-30", "2021-08-30", "2021-08-31", "2021-09-01", "2021-09-03", 
"2021-09-03", "2021-09-15", "2021-09-16", "2021-09-20", "2021-09-22", 
"2021-09-23", "2021-09-23", "2021-09-24", "2021-09-24", "2021-10-01", 
"2021-10-04", "2021-10-06", "2021-10-08", "2021-10-08", "2021-10-08", 
"2021-10-11", "2021-10-14", "2021-10-19", "2021-10-20", "2021-10-21", 
"2021-10-22", "2021-10-22", "2021-10-29", "2021-11-02", "2021-11-03", 
"2021-11-08", "2021-11-09", "2021-11-16", "2021-11-16", "2021-11-17"
)), row.names = c(229L, 8247L, 3068L, 7222L, 3746L, 3912L, 8019L, 
3610L, 6078L, 6085L, 6271L, 6284L, 6285L, 6310L, 6321L, 6335L, 
6336L, 3697L, 9149L, 8217L, 3734L, 220L, 6729L, 5562L, 7729L, 
7933L, 5291L, 7232L, 1647L, 7335L, 3418L, 7189L, 2912L, 7790L, 
6088L, 6247L, 6281L, 6338L, 7608L, 6614L, 410L, 2746L, 8296L, 
3117L, 177L, 2788L, 3301L, 6221L, 5173L, 2092L, 3577L, 6219L, 
6973L, 9020L, 1274L, 1768L, 8218L, 1822L, 2499L, 8107L, 1910L, 
4756L, 2739L, 7342L, 7857L, 6519L, 2104L, 3666L, 7506L, 2635L, 
3402L, 5566L, 2637L, 3036L, 2976L, 3871L, 8376L, 3112L, 4772L, 
6449L, 8200L, 8445L, 3310L, 4005L, 3219L, 8241L, 8266L, 2995L, 
3273L, 8401L, 3336L, 3118L, 2272L, 3333L, 3370L, 3952L, 7339L
), class = "data.frame")

Normally, I would do the following but I assume it doesn't work in this case since I am using a date class. How would I do this using dates?

# Filter df to exclude rows that were entered on a date from the list 
x[!(x$`Termination Date` %in% dates), ]

CodePudding user response:

When I run your example data, I see the Termination Date column is interpreted as the character class, not the date class.

Here is a solution that uses the tidyverse:

# Input Data
dates = c("2021-03-31", "2021-05-02", "2021-06-30", "2021-10-22")

x = structure(list(Gender = c("Male", "Female", "Male", "Male", "Female", 
                              "Male", "Female", "Female", "Female", "Female", "Female", "Female", 
                              "Female", "Female", "Female", "Female", "Male", "Male", "Female", 
                              "Female", "Female", "Male", "Female", "Female", "Male", "Female", 
                              "Male", "Female", "Female", "Female", "Male", "Male", "Female", 
                              "Female", "Female", "Female", "Female", "Female", "Female", "Male", 
                              "Female", "Male", "Female", "Male", "Female", "Female", "Female", 
                              "Male", "Male", "Female", "Female", "Female", "Male", "Male", 
                              "Female", "Female", "Female", "Male", "Female", "Male", "Female", 
                              "Male", "Female", "Female", "Female", "Female", "Female", "Female", 
                              "Female", "Female", "Male", "Female", "Female", "Female", "Female", 
                              "Female", "Male", "Male", "Female", "Male", "Female", "Female", 
                              "Male", "Female", "Female", "Female", "Female", "Female", "Male", 
                              "Female", "Female", "Male", "Female", "Female", "Female", "Female", 
                              "Female"), `Termination Date` = c("2021-01-05", "2021-02-12", 
                                                                "2021-02-22", "2021-02-24", "2021-03-12", "2021-03-12", "2021-03-24", 
                                                                "2021-03-26", "2021-03-31", "2021-03-31", "2021-03-31", "2021-03-31", 
                                                                "2021-03-31", "2021-03-31", "2021-03-31", "2021-03-31", "2021-03-31", 
                                                                "2021-04-02", "2021-04-02", "2021-04-05", "2021-04-09", "2021-04-30", 
                                                                "2021-05-05", "2021-05-11", "2021-05-11", "2021-05-14", "2021-05-21", 
                                                                "2021-05-21", "2021-05-24", "2021-06-01", "2021-06-11", "2021-06-11", 
                                                                "2021-06-14", "2021-06-24", "2021-06-27", "2021-06-27", "2021-06-27", 
                                                                "2021-06-27", "2021-07-02", "2021-07-07", "2021-07-23", "2021-07-26", 
                                                                "2021-07-26", "2021-07-27", "2021-07-30", "2021-08-02", "2021-08-06", 
                                                                "2021-08-06", "2021-08-09", "2021-08-11", "2021-08-13", "2021-08-13", 
                                                                "2021-08-13", "2021-08-13", "2021-08-16", "2021-08-18", "2021-08-20", 
                                                                "2021-08-23", "2021-08-24", "2021-08-25", "2021-08-27", "2021-08-27", 
                                                                "2021-08-30", "2021-08-30", "2021-08-31", "2021-09-01", "2021-09-03", 
                                                                "2021-09-03", "2021-09-15", "2021-09-16", "2021-09-20", "2021-09-22", 
                                                                "2021-09-23", "2021-09-23", "2021-09-24", "2021-09-24", "2021-10-01", 
                                                                "2021-10-04", "2021-10-06", "2021-10-08", "2021-10-08", "2021-10-08", 
                                                                "2021-10-11", "2021-10-14", "2021-10-19", "2021-10-20", "2021-10-21", 
                                                                "2021-10-22", "2021-10-22", "2021-10-29", "2021-11-02", "2021-11-03", 
                                                                "2021-11-08", "2021-11-09", "2021-11-16", "2021-11-16", "2021-11-17"
                              )), row.names = c(229L, 8247L, 3068L, 7222L, 3746L, 3912L, 8019L, 
                                                3610L, 6078L, 6085L, 6271L, 6284L, 6285L, 6310L, 6321L, 6335L, 
                                                6336L, 3697L, 9149L, 8217L, 3734L, 220L, 6729L, 5562L, 7729L, 
                                                7933L, 5291L, 7232L, 1647L, 7335L, 3418L, 7189L, 2912L, 7790L, 
                                                6088L, 6247L, 6281L, 6338L, 7608L, 6614L, 410L, 2746L, 8296L, 
                                                3117L, 177L, 2788L, 3301L, 6221L, 5173L, 2092L, 3577L, 6219L, 
                                                6973L, 9020L, 1274L, 1768L, 8218L, 1822L, 2499L, 8107L, 1910L, 
                                                4756L, 2739L, 7342L, 7857L, 6519L, 2104L, 3666L, 7506L, 2635L, 
                                                3402L, 5566L, 2637L, 3036L, 2976L, 3871L, 8376L, 3112L, 4772L, 
                                                6449L, 8200L, 8445L, 3310L, 4005L, 3219L, 8241L, 8266L, 2995L, 
                                                3273L, 8401L, 3336L, 3118L, 2272L, 3333L, 3370L, 3952L, 7339L
                              ), class = "data.frame")


library(dplyr)

df_without_undesired_dates <- x %>%
  filter(!`Termination Date` %in% dates)

  •  Tags:  
  • r
  • Related