Home > Mobile >  Finding the smallest and largest date in my dataset in R
Finding the smallest and largest date in my dataset in R

Time:06-18

I have a dataset with Character and Date variables. I would like to find the smallest and largest date in my dataset.

I am trying to use the pmin function but this does not seem to be working. Once the max and min date have been extracted, I want to create a dataset with a sequence of dates between them. For example, if the oldest date is 2021-02-01 (from new column) and most recent is 2022-06-20 (from old column) I want to create a list of dates between the two.

Table:

ID Old New Tier
001 NA 2021-02-01 A
002 NA 2021-02-01 A
003 NA 2021-02-21 A
004 NA 2021-04-21 A
005 NA 2021-04-21 A
006 NA 2021-04-21 A
006 2022-06-20 2021-04-21 B
002 2021-08-10 2021-04-21 B
003 2022-06-20 2021-05-01 B
003 2022-06-20 2021-05-01 B
003 2021-08-10 2021-05-21 B
003 2021-08-10 2021-07-21 B

Format variables in extended data: using str()

$ Old : Date, format: "2021-04-30" $ Id : chr $ New : Date, format: "2021-02-03" "2021-02-03" $ New1 : Date, format: NA NA NA NA ... $ New2 : Date, format: "2021-01-10" "2021-01-10" $ New3 : Date, format: NA NA "2021-06-10" NA ... $ New4 : Date, format: NA NA NA NA ... $ New5 : Date, format: NA NA "2022-07-10" NA ...

CodePudding user response:

In base R you can get the date range like this:

range(unlist(df[sapply(df, class) == "Date"]), na.rm = TRUE) |>
  as.Date(origin = "1970-01-01")
#> [1] "2021-02-01" "2022-06-20"

Explanation

To work with just columns of class "Date" in your data frame, you can do df[sapply(df, class) == "Date"]. If you unlist these columns, they form a single vector from which you can get the range (i.e. min / max), being sure you exclude NA values.

Unfortunately, these steps remove the class attribute from the vector, so you need to convert it back to a date.

CodePudding user response:

Same basic idea as Allan but an approach that preserves the class:

do.call(range, Filter(\(x) inherits(x, "Date"), dat))

[1] "2022-06-22" "2022-08-07"

Data:

dat <- data.frame(a = Sys.Date()   sample(50, 5),
                  b = letters[1:5],
                  c = Sys.Date()   sample(50, 5),
                  d = runif(5))
  • Related