I want to automatically test if the string contains only one type of character, with the result in a true/false variable "check"
input str11 contactno
"aaaaaaaaaaa"
"bbbbbbbbbbb"
"aaaaaaaaaab"
end
my attempt
gen check = .
//loop through dataset
local db =_N
forval x = 1/`db'{
dis as error "obs `x'"
//get first character in string
local f = substr(contactno, 1, 1) in `x'
//loop through each character in string
capture drop check_*
forvalues i = 1/11 {
quietly gen check_`i'=.
local j = substr(contactno, `i', 1) in `x'
//Tag characters that match
if "`j'" == "`f'" {
local y = 1
replace check_`i'= 1 in `x'
}
else {
local y= 0
replace check_`i'= 0 in `x'
}
}
Expected results the first two observations should be true and the third false.
CodePudding user response:
You can achieve this in one line of code as follows:
- Take the first character of
contactno
. - Find all instances of this character in
contactno
and replace with an empty string (i.e., ""). - Test whether the resulting string is empty.
gen check = missing(subinstr(contactno,substr(contactno,1,1),"",.))
---------------------
| contactno check |
|---------------------|
1. | aaaaaaaaaaa 1 |
2. | bbbbbbbbbbb 1 |
3. | aaaaaaaaaab 0 |
---------------------
So we are leveraging the fact that if all characters are not equal to the first character, then the string cannot contain only one (type of) character.
CodePudding user response:
Here's another way to do it.
clear
input str11 contactno
"aaaaaaaaaaa"
"bbbbbbbbbbb"
"aaaaaaaaaab"
end
gen long id = _n
save original_data, replace
expand 11
bysort id : gen character = substr(contactno, _n, 1)
bysort id (character) : gen byte OK = character[1] == character[_N]
drop character
bysort id : keep if _n == 1
merge 1:1 id using original_data
list
-------------------------------------
| contactno id OK _merge |
|-------------------------------------|
1. | aaaaaaaaaaa 1 1 Matched (3) |
2. | bbbbbbbbbbb 2 1 Matched (3) |
3. | aaaaaaaaaab 3 0 Matched (3) |
-------------------------------------