Home > OS >  Getting the same day across different years in R
Getting the same day across different years in R

Time:07-21

I have a dataset for a time series spanning a couple of years with daily observations. I'm trying to smooth some clearly wrong data inserted there (for example, negative values when the variable cannot take values below zero) and what I came up with was trying to smooth it or "interpolate" it by using both the mean of the days around that observation and the mean of the same day or couple of days from previous years, as I have yearly seasonality (I'm still unsure about this part, any comment would be greatly appreciated).

So my question is whether I can easily access the same day acrosss different years.

Here's a dummy example of my data:

library(tidyverse)
library(lubridate)

              date      value
2016-10-01 00:00:00     28  
2016-10-02 00:00:00     25    
2016-10-03 00:00:00     24   
2016-10-04 00:00:00     22     
2016-10-05 00:00:00     -6    
2016-10-06 00:00:00     26 

I have that for years 2016 through 2020. So in this example I would use the dates around 2016-10-05 AND I would like to use the dates around the 5th of October from years 2017 to 2020 to kind of maintain the seasonality, but maybe this is incorrect.

I tried to use years() from lubridate but I still have to do things manually and I would like to kind of autimatize things.

CodePudding user response:

If your question is solely "whether [you] can easily access the same day [across] different years", you could do that as follows:

# say your data frame is called df
library(lubridate)
day(df$date)

This will return the day part of the date for every entry in that column of your data frame.

Edit to reply to comment from asker:

This is a very basic way to specify the day and month for which you would like to obtain the corresponding rows in your data frame:

df[day(df$dates) == 5 & month(df$dates) == 10, ]
  • Related