Removing a Group When A Certain Value is Reached in R-CodePudding

I'm prepping a dataframe for event history analysis. The "group" in question consists of US states and the outcome of interest is whether or not they adopted a specific policy. Because I'm dealing with a non-repeating event (once they adopt the policy, it is assumed to be binding from the year of adoption to the end of the dataset), I want to remove a state from the panel once they adopt the policy.

Suppose we're looking at Pennsylvania, Arizona, and Georgia with data from 2010-2015. Let's say Arizona adopts the policy in 2012. Setting up the data would look something like this:

# create the panel
year <- rep(2010:2015, times = 3)
state <- rep(c("AZ","PA","GA"), each = 6)

panel <- as.data.frame(cbind(year, state))

# create dummy to indicate adoption
panel$adopted <- 0

# set adopted = 1 when AZ adopts the policy
panel$adopted[panel$year == 2012 & panel$state == "AZ"] <- 1

I would then want to remove AZ's observations from the years 2013-2015 but keep all observations for GA and PA.

I've thought about generating some kind of loop that identifies the rows in which the adoption variable equals 1, creating a new variable that would identify subsequent rows as ones that need to be deleted, and then filtering out those rows:

df$delete <- 0 

for (row in c(1:nrow(df))) {
 if df$adopted[row] == 1 {
  df$delete[row 1] <- 1
}
}

df <- df %>% filter(delete == 0)

However, while I know how to call the next row (df$delete[row 1]), I need to know how to call each row that follows the observation in which adopted == 1 up to the last row for the state. Any ideas? Happy to clarify if something is unclear.

CodePudding user response：

Try data.table package:

# convert to a data.table
panel <- data.table(panel) 
# get a year of adoption by state. In your case both min, and max works
panel[, year_adopted := min(year[adopted == 1]), by = .(state)] 
# filter out row where year < year of adopting policy or there is no adopting policy
panel[year <= year_adopted | is.na(year_adopted)]

CodePudding user response：

I think it is much easier to tackle this without a loop. Since this will probably be something you'll want to do to every state in the dataset, a function might be useful:

rm_after_adopted = function (panel, st) {
  year_adopted = with(panel, year[adopted == 1 & state == st])
  after_adopted = with(panel, {
    which(year > year_adopted & state == st)
  })
  return(panel[-after_adopted, ])
}

rm_after_adopted(panel, st = 'AZ')