Home > Software engineering >  How do I subset a panel data set with three criteria in Stata?
How do I subset a panel data set with three criteria in Stata?

Time:10-22

I have a panel dataset that looks like this:

person_id year cash
222 2020q4 6,000
222 2021q1 7,000
222 2021q2 8,000
321 2020q4 4,000
321 2021q4 11,000
321 2021q2 15,000

I want to subset the data set for the groups of person_id's that have < 10,000 in 2021q2 and keep all of their previous observations as well, and drop the person_ids who don't have that amount of cash by 2021q2. How can I subset this data by the three variables above?

So in this example I want to keep all observations of person_id 222 but drop all observations of 321.

Appreciate any assistance. Thanks!

CodePudding user response:

There might be an even more succinct way to do this, but I would split it up in these three steps:

clear
input int person_id str6 year int cash
222 "2020q4"  6000
222 "2021q1"  7000
222 "2021q2"  8000
321 "2020q4"  4000
321 "2021q4" 11000
321 "2021q2" 15000
end

*Test if obs has cash>10000 in 2021 q2
gen subset_obs = (cash > 10000 & year == "2021q2")

*By ID, get the max value in subset_obs to copy 1s to all rows for the ID
bysort person_id : egen subset_id = max(subset_obs)

*Keep only IDs with subset_id is 1
keep if subset_id == 1
  • Related