Home > Software engineering >  Substring extraction with multiple options
Substring extraction with multiple options

Time:03-03

I have a variable in Stata in my dataset that looks like this:

city
Washington city
Boston city
El Paso city
Nashville-Davidson metropolitan government (balance)
Lexington-Fayette urban county

And I want it to look like:

city
Washington
Boston
El Paso
Nashville-Davidson
Lexington-Fayette

"city," "county," and "urban county" are the only three words that follow after a city name. In other words, I want to extract the substring from left to the space before either city, county, or urban.

The only way I can think of approaching this using subinstring:

replace city = subinstr(city, " city", "", .)

I don't think, however, that I can add multiple options here.

CodePudding user response:

I used subinstr to replace the desired words with empty strings, and trim to remove additional spaces.

input str60(city)

"Washington city"
"Boston city"
"El Paso city"
"Lexington-Fayette urban county"
"Audacity"

end

gen     wanted = subinstr(city,"urban county","",1)
replace wanted = subinstr(wanted," county","",1)
replace wanted = subinstr(wanted," city","",1)

replace wanted = trim(wanted)

list

      ---------------------------------------------------- 
     |                           city              wanted |
     |----------------------------------------------------|
  1. |                Washington city          Washington |
  2. |                    Boston city              Boston |
  3. |                   El Paso city             El Paso |
  4. | Lexington-Fayette urban county   Lexington-Fayette |
  5. |                       Audacity            Audacity |
      ---------------------------------------------------- 

Edit: As suggested, I have incorporated a leading space so that places with "city" in their name (e.g. Audacity) are not inadvertently replaced. The same for "county" (although this seems less likely).

CodePudding user response:

split could be a way.

split city, parse(" city" " urban" " county") limit(1)
  • Related