I am doing a migration study indexed to a specific event. To create the dataset, I basically subset a larger dataset to a specific date, and then made flags based on additional dates, then added in information. In total, this takes 7 scripts. Now, I want to create a comparison dataset, with the same information but indexed to two years earlier.
My question is, is there an easy way where I can use the same script and just tell R to somehow treat all the code as two years before, or do I have to create a duplicate of the code and then edit it in line to be two years before. Here's a very basic example of some of the code I'm using to generate the dataset from a larger framework:
#example of things I'd want shifted 2 years
df <- subset(df, DATE_AFTER > as.Date("2016-09-27"))
df$flag <- with(df,
as.numeric(DATE_BEFORE < as.Date("2016-09-28") &
DATE_AFTER > as.Date("2016-09-27")))
df
# ID DATE_BEFORE DATE_AFTER flag
# 1 A 2013-01-23 2018-01-23 1
# 3 C 2018-01-23 2020-01-23 0
# 5 E 2011-01-23 2019-01-23 1
# 6 F 2010-01-23 2019-01-23 1
# 7 G 2017-01-23 2018-01-23 0
Dummy data
df <- data.frame(ID=c("A", "B", "C", "D", "E", "F", "G"),
DATE_BEFORE=as.Date(c("2013-01-23", "2010-01-23", "2018-01-23",
"2014-01-23", "2011-01-23", "2010-01-23",
"2017-01-23")),
DATE_AFTER=as.Date(c("2018-01-23", "2016-01-23", "2020-01-23",
"2016-01-23", "2019-01-23", "2019-01-23",
"2018-01-23")))
CodePudding user response:
Just wrap it in a function. To subtract on year we may use as.POSIXlt
as shown in this answer.
my_df_subset <- \(date, subtract_yr=0L) {
dt <- as.POSIXlt(paste0(date, '-01'))
dt$year <- dt$year - subtract_yr
dt <- as.Date(dt)
transform(subset(df, DATE_AFTER > dt),
flag=as.numeric(DATE_BEFORE < dt 1L &
DATE_AFTER > dt))
}
my_df_subset("2016-09-27")
# ID DATE_BEFORE DATE_AFTER flag
# 1 A 2013-01-23 2018-01-23 1
# 3 C 2018-01-23 2020-01-23 0
# 5 E 2011-01-23 2019-01-23 1
# 6 F 2010-01-23 2019-01-23 1
# 7 G 2017-01-23 2018-01-23 0
my_df_subset("2016-09-27", 2L) ## two years earlier
# ID DATE_BEFORE DATE_AFTER flag
# 1 A 2013-01-23 2018-01-23 1
# 2 B 2010-01-23 2016-01-23 1
# 3 C 2018-01-23 2020-01-23 0
# 4 D 2014-01-23 2016-01-23 1
# 5 E 2011-01-23 2019-01-23 1
# 6 F 2010-01-23 2019-01-23 1
# 7 G 2017-01-23 2018-01-23 0
Note: R >= 4.1 used.
Data:
df <- structure(list(ID = c("A", "B", "C", "D", "E", "F", "G"), DATE_BEFORE = structure(c(15728,
14632, 17554, 16093, 14997, 14632, 17189), class = "Date"), DATE_AFTER = structure(c(17554,
16823, 18284, 16823, 17919, 17919, 17554), class = "Date")), class = "data.frame", row.names = c(NA,
-7L))