I'm prepping a dataframe for event history analysis. The "group" in question consists of US states and the outcome of interest is whether or not they adopted a specific policy. Because I'm dealing with a non-repeating event (once they adopt the policy, it is assumed to be binding from the year of adoption to the end of the dataset), I want to remove a state from the panel once they adopt the policy.
Suppose we're looking at Pennsylvania, Arizona, and Georgia with data from 2010-2015. Let's say Arizona adopts the policy in 2012. Setting up the data would look something like this:
# create the panel
year <- rep(2010:2015, times = 3)
state <- rep(c("AZ","PA","GA"), each = 6)
panel <- as.data.frame(cbind(year, state))
# create dummy to indicate adoption
panel$adopted <- 0
# set adopted = 1 when AZ adopts the policy
panel$adopted[panel$year == 2012 & panel$state == "AZ"] <- 1
I would then want to remove AZ's observations from the years 2013-2015 but keep all observations for GA and PA.
I've thought about generating some kind of loop that identifies the rows in which the adoption variable equals 1, creating a new variable that would identify subsequent rows as ones that need to be deleted, and then filtering out those rows:
df$delete <- 0
for (row in c(1:nrow(df))) {
if df$adopted[row] == 1 {
df$delete[row 1] <- 1
}
}
df <- df %>% filter(delete == 0)
However, while I know how to call the next row (df$delete[row 1]), I need to know how to call each row that follows the observation in which adopted == 1 up to the last row for the state. Any ideas? Happy to clarify if something is unclear.
CodePudding user response:
Try data.table
package:
# convert to a data.table
panel <- data.table(panel)
# get a year of adoption by state. In your case both min, and max works
panel[, year_adopted := min(year[adopted == 1]), by = .(state)]
# filter out row where year < year of adopting policy or there is no adopting policy
panel[year <= year_adopted | is.na(year_adopted)]
CodePudding user response:
I think it is much easier to tackle this without a loop. Since this will probably be something you'll want to do to every state in the dataset, a function might be useful:
rm_after_adopted = function (panel, st) {
year_adopted = with(panel, year[adopted == 1 & state == st])
after_adopted = with(panel, {
which(year > year_adopted & state == st)
})
return(panel[-after_adopted, ])
}
rm_after_adopted(panel, st = 'AZ')