I have a table of urls that are not valid.
I need to pull a query with only valid urls but also clean up any urls that I can.
Such as, trimming up any sting after .com
Requirements:
• Web address must be in a URL format (http://abc.xxx
, http://www.abc.xxx
or www.abc.xxx
)
select url,
case
when regexp_like(r.url, '^(http(s)*://)*[a-z0-9.-] $')
then r.url
else null
end webAddress2
from url r
URL table:
id URL
1 http://www.a.com/NSLIJ/lij
2 [email protected]
3 /www.smithtowncenter.com
4 www.b.com
5 http://b.com
6 http://www.b.com
I've used regexp_like
but it's not quite correct and doesn't trim
Any help would be appreciated.
Query results (should look like)
id URL newURL
1 http://www.a.com/NSLIJ/lij http://www.a.com (trimmed off `/NSLIJ/lij`
2 [email protected]
3 /www.smithtowncenter.com www.smithtowncenter.com (trimmed off `/`)
4 www.b.com www.b.com
5 http://.b.com http://.b.com
6 http://www.b.com http://www.b.com
CodePudding user response:
You can use the regexp ^(http(s)?://)?[a-z0-9] (\.[a-z0-9] ) (/[a-zA-Z] )*?$
to find correct matches.
For example:
select t.*,
case when regexp_like(url, '^(http(s)?://)?[a-z0-9] (\.[a-z0-9] ) (/[a-zA-Z] )*?$')
then 'valid' else 'invalid' end as chk
from t;
Result:
ID URL CHK
--- --------------------------- -------
1 http://www.a.com/NSLIJ/lij valid
2 [email protected] invalid
3 /www.smithtowncenter.com invalid
4 www.b.com valid
5 http://b.com valid
6 http://www.b.com valid
See running example at db<>fiddle.