Home > front end >  Shifting a date-based analysis by a year: process questions
Shifting a date-based analysis by a year: process questions

Time:01-27

I am doing a migration study indexed to a specific event. To create the dataset, I basically subset a larger dataset to a specific date, and then made flags based on additional dates, then added in information. In total, this takes 7 scripts. Now, I want to create a comparison dataset, with the same information but indexed to two years earlier.

My question is, is there an easy way where I can use the same script and just tell R to somehow treat all the code as two years before, or do I have to create a duplicate of the code and then edit it in line to be two years before. Here's a very basic example of some of the code I'm using to generate the dataset from a larger framework:

#example of things I'd want shifted 2 years     
df <- subset(df, DATE_AFTER > as.Date("2016-09-27"))
df$flag <- with(df, 
                as.numeric(DATE_BEFORE < as.Date("2016-09-28") & 
                             DATE_AFTER > as.Date("2016-09-27")))
df
#   ID DATE_BEFORE DATE_AFTER flag
# 1  A  2013-01-23 2018-01-23    1
# 3  C  2018-01-23 2020-01-23    0
# 5  E  2011-01-23 2019-01-23    1
# 6  F  2010-01-23 2019-01-23    1
# 7  G  2017-01-23 2018-01-23    0

Dummy data

df <- data.frame(ID=c("A", "B", "C", "D", "E", "F", "G"),
                 DATE_BEFORE=as.Date(c("2013-01-23", "2010-01-23", "2018-01-23",
                                       "2014-01-23", "2011-01-23", "2010-01-23", 
                                       "2017-01-23")),
                 DATE_AFTER=as.Date(c("2018-01-23", "2016-01-23", "2020-01-23", 
                                      "2016-01-23", "2019-01-23", "2019-01-23", 
                                      "2018-01-23")))

CodePudding user response:

Just wrap it in a function. To subtract on year we may use as.POSIXlt as shown in this answer.

my_df_subset <- \(date, subtract_yr=0L) {
  dt <- as.POSIXlt(paste0(date, '-01'))
  dt$year <- dt$year - subtract_yr
  dt <- as.Date(dt)
  transform(subset(df, DATE_AFTER > dt),
            flag=as.numeric(DATE_BEFORE < dt   1L & 
                              DATE_AFTER > dt))
}

my_df_subset("2016-09-27")
#   ID DATE_BEFORE DATE_AFTER flag
# 1  A  2013-01-23 2018-01-23    1
# 3  C  2018-01-23 2020-01-23    0
# 5  E  2011-01-23 2019-01-23    1
# 6  F  2010-01-23 2019-01-23    1
# 7  G  2017-01-23 2018-01-23    0

my_df_subset("2016-09-27", 2L)  ## two years earlier
#   ID DATE_BEFORE DATE_AFTER flag
# 1  A  2013-01-23 2018-01-23    1
# 2  B  2010-01-23 2016-01-23    1
# 3  C  2018-01-23 2020-01-23    0
# 4  D  2014-01-23 2016-01-23    1
# 5  E  2011-01-23 2019-01-23    1
# 6  F  2010-01-23 2019-01-23    1
# 7  G  2017-01-23 2018-01-23    0

Note: R >= 4.1 used.


Data:

df <- structure(list(ID = c("A", "B", "C", "D", "E", "F", "G"), DATE_BEFORE = structure(c(15728, 
14632, 17554, 16093, 14997, 14632, 17189), class = "Date"), DATE_AFTER = structure(c(17554, 
16823, 18284, 16823, 17919, 17919, 17554), class = "Date")), class = "data.frame", row.names = c(NA, 
-7L))
  •  Tags:  
  • Related