Home > Blockchain >  TRIM values after pattern
TRIM values after pattern

Time:02-17

I have a table of urls that are not valid.

I need to pull a query with only valid urls but also clean up any urls that I can.

Such as, trimming up any sting after .com

Requirements: • Web address must be in a URL format (http://abc.xxx , http://www.abc.xxx or www.abc.xxx)

 select url,
 case
 when regexp_like(r.url, '^(http(s)*://)*[a-z0-9.-] $')
 then r.url
 else null 
 end webAddress2
 from url r

URL table:

id  URL  
 1  http://www.a.com/NSLIJ/lij
 2  [email protected]
 3  /www.smithtowncenter.com
 4  www.b.com
 5  http://b.com
 6  http://www.b.com

I've used regexp_like but it's not quite correct and doesn't trim

Any help would be appreciated.

Query results (should look like)

 id  URL                             newURL
 1  http://www.a.com/NSLIJ/lij     http://www.a.com (trimmed off `/NSLIJ/lij`
 2  [email protected]
 3  /www.smithtowncenter.com         www.smithtowncenter.com (trimmed off `/`)
 4  www.b.com                        www.b.com
 5  http://.b.com                    http://.b.com
 6  http://www.b.com                 http://www.b.com

CodePudding user response:

You can use the regexp ^(http(s)?://)?[a-z0-9] (\.[a-z0-9] ) (/[a-zA-Z] )*?$ to find correct matches.

For example:

select t.*,
  case when regexp_like(url, '^(http(s)?://)?[a-z0-9] (\.[a-z0-9] ) (/[a-zA-Z] )*?$') 
  then 'valid' else 'invalid' end as chk
from t;

Result:

 ID  URL                         CHK     
 --- --------------------------- ------- 
 1   http://www.a.com/NSLIJ/lij  valid   
 2   [email protected]                  invalid 
 3   /www.smithtowncenter.com    invalid 
 4   www.b.com                   valid   
 5   http://b.com                valid   
 6   http://www.b.com            valid   

See running example at db<>fiddle.

  • Related