Home > Mobile >  r filter dates with NAs
r filter dates with NAs

Time:10-12

I am dealing with a dataset like this

Id   Col1   Date   FromDate   ToDate     Weight
1    Blue   2018   NA         NA         24.9
2    Green  2018   NA         NA         14.5
3    Red    2010   2009-05-15 2015-09-15 87.5
3    Red    2010   2015-10-15 2020-08-15 43.8
4    Yellow 2020   2019-10-15 2021-05-15 29.5
5    Purple 2011   NA         NA         NA
  • I like to retain rows that are NA in From and To Date column because there is nothing to compare the Date values in column Date and

  • If From and To Date is not missing then retain only those where the Date value is between the From and To Date.

Based on the logic the expected dataset would be like this

Id   Col1   Date   FromDate   ToDate     Weight
1    Blue   2018   NA         NA         24.9
2    Green  2018   NA         NA         14.5
3    Red    2010   2009-05-15 2015-09-15 87.5

4    Yellow 2020   2019-10-15 2021-05-15 29.5
5    Purple 2011   NA         NA         NA

I know I can use the filter function but I am not sure how to make it work with the NA values not excluded. Any suggestion is much appreciated.

CodePudding user response:

You should be able to do like this:

df <- tribble(
  ~Date, ~From, ~To,
  2018, NA, NA,
  2010, "2009-05-09", "2015-09-15"
)

df %>% filter(
  # Condition 1
  (is.na(From) & is.na(To)) | 
    # Condition 2
    ((Date > year(From))) & 
       (Date < year(To))))
  )

CodePudding user response:

Here's another solution, which utilises the {lubridate} package.

library(tidyverse)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
dates <- tibble::tribble(
           ~Id,    ~Col1, ~Date,    ~FromDate,      ~ToDate, ~Weight,
            1L,   "Blue", 2018L,           NA,           NA,    24.9,
            2L,  "Green", 2018L,           NA,           NA,    14.5,
            3L,    "Red", 2010L, "2009-05-15", "2015-09-15",    87.5,
            3L,    "Red", 2010L, "2015-10-15", "2020-08-15",    43.8,
            4L, "Yellow", 2020L, "2019-10-15", "2021-05-15",    29.5,
            5L, "Purple", 2011L,           NA,           NA,      NA
           )

dates %>% 
  mutate(FromDate = lubridate::ymd(FromDate),
         ToDate   = lubridate::ymd(ToDate),
         Date     = lubridate::ymd(Date, truncated = 2L),
         Interval = lubridate::interval(ymd(FromDate), ymd(ToDate))) %>% 
  filter( ((Date %within% Interval) == TRUE) | (is.na(FromDate) & is.na(ToDate))) %>% 
  select(-Interval)
#> # A tibble: 5 × 6
#>      Id Col1   Date       FromDate   ToDate     Weight
#>   <int> <chr>  <date>     <date>     <date>      <dbl>
#> 1     1 Blue   2018-01-01 NA         NA           24.9
#> 2     2 Green  2018-01-01 NA         NA           14.5
#> 3     3 Red    2010-01-01 2009-05-15 2015-09-15   87.5
#> 4     4 Yellow 2020-01-01 2019-10-15 2021-05-15   29.5
#> 5     5 Purple 2011-01-01 NA         NA           NA

Created on 2021-10-11 by the reprex package (v2.0.1)

  • Related