I am running a query on a table to return the URL of different organizations and I am trying to remove any instance of 'https://', 'http://' AND 'www.' at the beginning, and any '/' at the end
So for example if I have a URL that is currently returning 'https://www.pizzatest.com/'
and I need it to just be pizzatest.com
This is for importing organization domains into Zendesk
I have found a query that works for removing https:// and http:// and any / or ? at the end of the URL, but I can not seem to figure out how to remove the 'www.' from the beginning.
The query was taken from this question here - credit to Fenton Extract hostname from a URL
/* Get just the host name from a URL */ SUBSTRING(@WebAddress, /* Starting Position (After any '//') */ (CASE WHEN CHARINDEX('//', @WebAddress)= 0 THEN 1 ELSE CHARINDEX('//', @WebAddress) 2 END), /* Length (ending on first '/' or on a '?') */ CASE WHEN CHARINDEX('/', @WebAddress, CHARINDEX('//', @WebAddress) 2) > 0 THEN CHARINDEX('/', @WebAddress, CHARINDEX('//', @WebAddress) 2) - (CASE WHEN CHARINDEX('//', @WebAddress)= 0 THEN 1 ELSE CHARINDEX('//', @WebAddress) 2 END) WHEN CHARINDEX('?', @WebAddress, CHARINDEX('//', @WebAddress) 2) > 0 THEN CHARINDEX('?', @WebAddress, CHARINDEX('//', @WebAddress) 2) - (CASE WHEN CHARINDEX('//', @WebAddress)= 0 THEN 1 ELSE CHARINDEX('//', @WebAddress) 2 END) ELSE LEN(@WebAddress) END ) AS 'HostName'
CodePudding user response:
You can use regexp_replace() to remove the different parts, then use trim()
to get rid of any leading of trailing /
trim(regexp_replace(the_column, '(http://)|(https://)|(www.)', '', 'g'), '/')