Home > Software design >  Regex - find specific sequence of characters, some are not letters, digits or underscore
Regex - find specific sequence of characters, some are not letters, digits or underscore

Time:09-27

I am new to regex and need to search a string field in Impala for multiple matches to this exact sequence of characters: ~FC*

since the ~ and * are not letters or digits, I am unsure on how to search for these in this specific order and not just for any of these single characters occuring.

This is my code so far, have tried both of these [~FC*] or ^~FC*$

This is a test string, it has 2 occurrences:

N4*CITY*STATE*2155446*2120~FC*C*IND*30*MC*blah blah fjdgfeufh*27*0*****Y~FC*Z*IND*39*MC*jhlkfhfudfgsdkufgkusgfn*23*0*****Y~

CodePudding user response:

.*(~FC\*).* or .*(\~FC\*).*

.* - Zero or more characters
.*(~FC\*).* - Means Search for ~FC* 

    

if 1st one does not work, please try second one, it might work if tilde symbol is reserved in regex.

CodePudding user response:

You can use a simple SQL like below. This will work only on hardcoded string.

select (length(mycol)- length (replace(mycol,'~FC*','')))/length('~FC*') as occurance_str

Here is the SQL i tested ok

select 
(length('N4*CITY*STATE*2155446*2120~FC*C*IND*30*MC*blah blah fjdgfeufh*27*0*****Y~FC*Z*IND*39*MC*jhlkfhfudfgsdkufgkusgfn*23*0*****Y~')
- length(replace('N4*CITY*STATE*2155446*2120~FC*C*IND*30*MC*blah blah fjdgfeufh*27*0*****Y~FC*Z*IND*39*MC*jhlkfhfudfgsdkufgkusgfn*23*0*****Y~','~FC*',''))
)/length('~FC*') as occurance_str
  • Related