In Stata I have split a variable where up to 20 countries were separated by a comma and now I have twenty different variables (country1
to country20
), but the same country is listed in more than one variable country1
to country20
.
For instance, Uganda may be in country1
, country2
and in country5
. Now, I want to create one variable for each country (1 if true, 0 false). So, basically Iwant one variable for each of the twenty countries. I tried this but did not work.
local N = _N
forvalues i = 1/`N' {
local s1 = Countryies1 [`i']
local s2 = Countryies2 [`i']
local s3 = Countryies3 [`i']
local s4 = Countryies4 [`i']
local s5 = Countryies5 [`i']
local s6 = Countryies6 [`i']
local s7 = Countryies7 [`i']
local s8 = Countryies8 [`i']
local s9 = Countryies9 [`i']
local s10 = Countryies10 [`i']
local s11 = Countryies11 [`i']
local s12 = Countryies12 [`i']
local s13 = Countryies13 [`i']
local s14 = Countryies14 [`i']
local s15 = Countryies15 [`i']
local s16 = Countryies16 [`i']
local s17 = Countryies17 [`i']
local s18 = Countryies18 [`i']
local s19 = Countryies19 [`i']
local s20 = Countryies20 [`i']
local intersection: list s1 & s2 & s3 & s4 & s5 & s6 & s7 & s8 & s9 & s10 & s11 & s12 & s13 & s14 & s15 & s16 & s17 & s18 & s19 & s20
replace country ="`intersection'" in `i'
}
CodePudding user response:
This seems to work -- and does not in any sense rule out other solutions.
clear
input str42 countries
"Uganda"
"Uganda, Kenya"
"Uganda, Kenya, Tanzania"
"South Africa"
end
gen id = _n
save datasofar, replace
keep id countries
split countries, parse(,)
drop countries
reshape long countries, i(id) j(which)
drop if missing(countries)
replace countries = trim(countries)
gen name = strtoname(countries)
levelsof name, local(names)
gen new_id = _n
foreach n of local names {
gen is_`n' = name == "`n'"
su new_id if is_`n', meanonly
label var is_`n' "`=countries[r(min)]'"
local vars `vars' is_`n'
}
collapse (max) `vars', by(id)
merge 1:1 id using datasofar
----------------------------------------------------------------------------------------
| id is_Kenya is_Sou~a is_Tan~a is_Uga~a countries _merge |
|----------------------------------------------------------------------------------------|
1. | 1 0 0 0 1 Uganda Matched (3) |
2. | 2 1 0 0 1 Uganda, Kenya Matched (3) |
3. | 3 1 0 1 1 Uganda, Kenya, Tanzania Matched (3) |
4. | 4 0 1 0 0 South Africa Matched (3) |
----------------------------------------------------------------------------------------
Another kind of solution is to just to loop over the names, so
foreach c in Uganda Kenya Tanzania "South Africa" {
local C = strtoname("`c'")
gen is_`C' = strpos(countries, "`c'") > 0
}
but watch out -- variations in spelling will bite you. They will bite with the earlier code too.