I have a variable in Stata in my dataset that looks like this:
city
Washington city
Boston city
El Paso city
Nashville-Davidson metropolitan government (balance)
Lexington-Fayette urban county
And I want it to look like:
city
Washington
Boston
El Paso
Nashville-Davidson
Lexington-Fayette
"city," "county," and "urban county" are the only three words that follow after a city name. In other words, I want to extract the substring from left to the space before either city, county, or urban.
The only way I can think of approaching this using subinstring:
replace city = subinstr(city, " city", "", .)
I don't think, however, that I can add multiple options here.
CodePudding user response:
I used subinstr
to replace the desired words with empty strings, and trim
to remove additional spaces.
input str60(city)
"Washington city"
"Boston city"
"El Paso city"
"Lexington-Fayette urban county"
"Audacity"
end
gen wanted = subinstr(city,"urban county","",1)
replace wanted = subinstr(wanted," county","",1)
replace wanted = subinstr(wanted," city","",1)
replace wanted = trim(wanted)
list
----------------------------------------------------
| city wanted |
|----------------------------------------------------|
1. | Washington city Washington |
2. | Boston city Boston |
3. | El Paso city El Paso |
4. | Lexington-Fayette urban county Lexington-Fayette |
5. | Audacity Audacity |
----------------------------------------------------
Edit: As suggested, I have incorporated a leading space so that places with "city" in their name (e.g. Audacity) are not inadvertently replaced. The same for "county" (although this seems less likely).
CodePudding user response:
split
could be a way.
split city, parse(" city" " urban" " county") limit(1)