Say I have this data:
sysuse auto2, clear
keep if _n<=4
describe
local N = r(N)
gen a1 = price
gen a2 = mpg
gen a3 = headroom
gen a4 = trunk
gen a5 = weight
gen a6 = length
input yearA yearB
1 4
1 5
2 5
1 6
keep a1-a6 yearA yearB
I'd like to do a row-specific operation based on the value of other variables. As an example, I'd like to add up all a
columns corresponding to some row-specific rule, in this case starting a year after yearA
and a year before yearB
. So, if yearA==1
and yearB==5
, the starting year is 2
and the end year is 4
, so we would add a2
, a3
, and a4
together to get that row's total
. Each row has its own rule corresponding to (a function of) its values of yearA
and yearB
.
I came up with the following solution, which works, but it is clunky and slow:
gen total = .
forvalues i = 1/`N' {
local start = yearA[`i'] 1
local end = yearB[`i']-1
display "`start' `end'"
*annoyingly, you can't replace with egen, so create a new variable and delete it
egen total`i' = rowtotal(a`start'-a`end')
replace total = total`i' if _n==`i'
drop total`i'
}
As noted in the comment in the loop, I resorted to creating a new variable for each row and deleting it after using its value. Why? Because it doesn't seem like one can use replace
with egen
.
The actual application creates multiple variables and there are millions of observations, so it takes many hours or even days to run. What is a faster way to accomplish my goal? I am in now tied to doing things row-by-row if there is a better way.
CodePudding user response:
gen wanted = 0
forval j = 1/6 {
replace wanted = wanted a`j' if inrange(`j', yearA 1, yearB - 1)
}