Home > Software engineering >  How to recode this string variable into a new variable?
How to recode this string variable into a new variable?

Time:11-03

I want to recode my variable Ucod in Stata with >100000 different observations into 3-4 classified values in the form of a new variable.

The problem is that I don't want to enter all the values of Ucod to recode. For example I want to use an if condition like if any value in Ucod starts with I (e.g, I234, I345, I587) recode the whole value to CVD.

I have tried using strpos() function using different conditions but I was unsuccessful.

Attaching picture of my data and variable Ucod View of my data set

CodePudding user response:

You could just use gen and a series of replace commands:

gen ucod_category = 0 if ucod >= "I00" & ucod <= "I519"
replace ucod_category = 1 if ucod >= "I60" & ucod <= "I698"

Then label these categories as CVD, Stroke, etc. This should sort in the expected way for your I10 codes with missing decimal points (e.g. "I519" < "I60").

However it might be more convenient to convert ucod into a number (with first digit 0 for A, 1 for B etc.) so that you can recode it with labels in a single command:

gen ucod_numeric = (ascii(substr(ucod, 0, 1)) - 65) * 1000   real(substr(ucod, 1)) / cond(strlen(ucod) == 4, 10, 1)
recode ucod_numeric (800/851.9=0 "CVD") (860/869.8=1 "Stroke"), generate(ucod_category) 

Again, this should sort in the expected order: I519 (which becomes 851.9) < I60 (860).

EDIT: since ascii isn't working (possibly a Stata version issue) you can try something like this to change the letter to a number.

gen ucod_letter_code = -1
forvalues i = 0/25 {
     replace ucod_letter_code = `i' if substr(ucod, 1) == char(`i'   65)
}
gen ucod_numeric = ucod_letter_code * 1000   real(substr(ucod, 1)) / cond(strlen(ucod) == 4, 10, 1)
recode ucod_numeric (800/851.9=0 "CVD") (860/869.8=1 "Stroke"), generate(ucod_category) 
  • Related