Home > Enterprise >  Compare dates in a dataframe column with a single date
Compare dates in a dataframe column with a single date

Time:11-23

I'm trying to compare each date (find maximum value/latest date) for each row in a data frame column with a single date. For example:

   date
1  2018-07-31
2  2018-08-01
3  2018-08-02
4  2018-08-03

When I compare to compare_date="2018-08-02", it should give an output of the latest date between each row and the compare_date. So the new data frame would look like this:

   new_date
1  2018-08-02
2  2018-08-02
3  2018-08-02
4  2018-08-03

I'm trying to use sapply to this problem:

data$new_date <- sapply(data$date,function(x){max(x,compare_date)})

But I got the output not in a date format, like this:

   date        new_date
1  2018-07-31  17745
2  2018-08-01  17745
3  2018-08-02  17745
4  2018-08-03  17746

Please Note that I had converted the data$date and compare_date to Date format using as.Date.

Why is the output not in a date format? Am I using sapply in the wrong way?

CodePudding user response:

There are vectorised functions available in R to do this instead of using sapply. In this case, you can use pmax -

df$date <- as.Date(df$date)
compare_date=as.Date("2018-08-02")
df$date <- pmax(df$date, compare_date)
df

#        date
#1 2018-08-02
#2 2018-08-02
#3 2018-08-02
#4 2018-08-03

data

df <- structure(list(date = c("2018-07-31", "2018-08-01", "2018-08-02", 
"2018-08-03")), class = "data.frame", row.names = c(NA, -4L))

CodePudding user response:

Sapply returns that way. You may find out the reason using as.vector

as.vector(data$date)
[1] 17743 17744 17745 17746

Using Reduce and lapply will helps

data$new_date <- Reduce(c,lapply(data$date,function(x){max(x,as.Date("2018-08-02"))}))
data

        date   new_date
1 2018-07-31 2018-08-02
2 2018-08-01 2018-08-02
3 2018-08-02 2018-08-02
4 2018-08-03 2018-08-03
  • Related