Home > Mobile >  Stata test if string contains same character
Stata test if string contains same character

Time:04-14

I want to automatically test if the string contains only one type of character, with the result in a true/false variable "check"

input str11 contactno 
"aaaaaaaaaaa"
"bbbbbbbbbbb"
"aaaaaaaaaab"
end

my attempt

gen check = .
//loop through dataset
local db =_N
forval x = 1/`db'{
dis as error "obs `x'"
//get first character in string
local f = substr(contactno, 1, 1) in `x' 
//loop through each character in string
capture drop check_*
forvalues i = 1/11 {
    quietly gen check_`i'=.
    local j = substr(contactno, `i', 1) in `x'

    //Tag characters that match
    if "`j'" == "`f'"  {
    local y = 1
    replace check_`i'= 1 in `x'
        } 
    else  {
    local y= 0
    replace check_`i'= 0 in `x'
    }
    
}

Expected results the first two observations should be true and the third false.

CodePudding user response:

You can achieve this in one line of code as follows:

  1. Take the first character of contactno.
  2. Find all instances of this character in contactno and replace with an empty string (i.e., "").
  3. Test whether the resulting string is empty.
gen check = missing(subinstr(contactno,substr(contactno,1,1),"",.))


      --------------------- 
     |   contactno   check |
     |---------------------|
  1. | aaaaaaaaaaa       1 |
  2. | bbbbbbbbbbb       1 |
  3. | aaaaaaaaaab       0 |
      --------------------- 

So we are leveraging the fact that if all characters are not equal to the first character, then the string cannot contain only one (type of) character.

CodePudding user response:

Here's another way to do it.

clear 
input str11 contactno 
"aaaaaaaaaaa"
"bbbbbbbbbbb"
"aaaaaaaaaab"
end

gen long id = _n
save original_data, replace 
expand 11 
bysort id : gen character = substr(contactno, _n, 1)
bysort id (character) : gen byte OK = character[1] == character[_N]
drop character 
bysort id : keep if _n == 1 
merge 1:1 id using original_data 

list 

      ------------------------------------- 
     |   contactno   id   OK        _merge |
     |-------------------------------------|
  1. | aaaaaaaaaaa    1    1   Matched (3) |
  2. | bbbbbbbbbbb    2    1   Matched (3) |
  3. | aaaaaaaaaab    3    0   Matched (3) |
      ------------------------------------- 
  • Related