I have a dataset with Character and Date variables. I would like to find the smallest and largest date in my dataset.
I am trying to use the pmin function but this does not seem to be working. Once the max and min date have been extracted, I want to create a dataset with a sequence of dates between them. For example, if the oldest date is 2021-02-01 (from new column) and most recent is 2022-06-20 (from old column) I want to create a list of dates between the two.
Table:
ID | Old | New | Tier |
---|---|---|---|
001 | NA | 2021-02-01 | A |
002 | NA | 2021-02-01 | A |
003 | NA | 2021-02-21 | A |
004 | NA | 2021-04-21 | A |
005 | NA | 2021-04-21 | A |
006 | NA | 2021-04-21 | A |
006 | 2022-06-20 | 2021-04-21 | B |
002 | 2021-08-10 | 2021-04-21 | B |
003 | 2022-06-20 | 2021-05-01 | B |
003 | 2022-06-20 | 2021-05-01 | B |
003 | 2021-08-10 | 2021-05-21 | B |
003 | 2021-08-10 | 2021-07-21 | B |
Format variables in extended data: using str()
$ Old : Date, format: "2021-04-30" $ Id : chr $ New : Date, format: "2021-02-03" "2021-02-03" $ New1 : Date, format: NA NA NA NA ... $ New2 : Date, format: "2021-01-10" "2021-01-10" $ New3 : Date, format: NA NA "2021-06-10" NA ... $ New4 : Date, format: NA NA NA NA ... $ New5 : Date, format: NA NA "2022-07-10" NA ...
CodePudding user response:
In base R you can get the date range like this:
range(unlist(df[sapply(df, class) == "Date"]), na.rm = TRUE) |>
as.Date(origin = "1970-01-01")
#> [1] "2021-02-01" "2022-06-20"
Explanation
To work with just columns of class "Date" in your data frame, you can do df[sapply(df, class) == "Date"]
. If you unlist
these columns, they form a single vector from which you can get the range
(i.e. min / max), being sure you exclude NA
values.
Unfortunately, these steps remove the class attribute from the vector, so you need to convert it back to a date.
CodePudding user response:
Same basic idea as Allan but an approach that preserves the class:
do.call(range, Filter(\(x) inherits(x, "Date"), dat))
[1] "2022-06-22" "2022-08-07"
Data:
dat <- data.frame(a = Sys.Date() sample(50, 5),
b = letters[1:5],
c = Sys.Date() sample(50, 5),
d = runif(5))