I am dealing with a dataset like this
Id Col1 Date FromDate ToDate Weight
1 Blue 2018 NA NA 24.9
2 Green 2018 NA NA 14.5
3 Red 2010 2009-05-15 2015-09-15 87.5
3 Red 2010 2015-10-15 2020-08-15 43.8
4 Yellow 2020 2019-10-15 2021-05-15 29.5
5 Purple 2011 NA NA NA
I like to retain rows that are NA in From and To Date column because there is nothing to compare the Date values in column Date and
If From and To Date is not missing then retain only those where the Date value is between the From and To Date.
Based on the logic the expected dataset would be like this
Id Col1 Date FromDate ToDate Weight
1 Blue 2018 NA NA 24.9
2 Green 2018 NA NA 14.5
3 Red 2010 2009-05-15 2015-09-15 87.5
4 Yellow 2020 2019-10-15 2021-05-15 29.5
5 Purple 2011 NA NA NA
I know I can use the filter
function but I am not sure how to make it work with the NA values not excluded. Any suggestion is much appreciated.
CodePudding user response:
You should be able to do like this:
df <- tribble(
~Date, ~From, ~To,
2018, NA, NA,
2010, "2009-05-09", "2015-09-15"
)
df %>% filter(
# Condition 1
(is.na(From) & is.na(To)) |
# Condition 2
((Date > year(From))) &
(Date < year(To))))
)
CodePudding user response:
Here's another solution, which utilises the {lubridate}
package.
library(tidyverse)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
dates <- tibble::tribble(
~Id, ~Col1, ~Date, ~FromDate, ~ToDate, ~Weight,
1L, "Blue", 2018L, NA, NA, 24.9,
2L, "Green", 2018L, NA, NA, 14.5,
3L, "Red", 2010L, "2009-05-15", "2015-09-15", 87.5,
3L, "Red", 2010L, "2015-10-15", "2020-08-15", 43.8,
4L, "Yellow", 2020L, "2019-10-15", "2021-05-15", 29.5,
5L, "Purple", 2011L, NA, NA, NA
)
dates %>%
mutate(FromDate = lubridate::ymd(FromDate),
ToDate = lubridate::ymd(ToDate),
Date = lubridate::ymd(Date, truncated = 2L),
Interval = lubridate::interval(ymd(FromDate), ymd(ToDate))) %>%
filter( ((Date %within% Interval) == TRUE) | (is.na(FromDate) & is.na(ToDate))) %>%
select(-Interval)
#> # A tibble: 5 × 6
#> Id Col1 Date FromDate ToDate Weight
#> <int> <chr> <date> <date> <date> <dbl>
#> 1 1 Blue 2018-01-01 NA NA 24.9
#> 2 2 Green 2018-01-01 NA NA 14.5
#> 3 3 Red 2010-01-01 2009-05-15 2015-09-15 87.5
#> 4 4 Yellow 2020-01-01 2019-10-15 2021-05-15 29.5
#> 5 5 Purple 2011-01-01 NA NA NA
Created on 2021-10-11 by the reprex package (v2.0.1)