Home > Blockchain >  Using REGEXP_SUBSTR to extract website domain
Using REGEXP_SUBSTR to extract website domain

Time:11-20

I have a field called Website with examples that look like:

https://ivalore.com.br
https://cornstalk.com
https://penny.co

I am trying to use REGEXP_SUBSTR to isolate the domain: REGEXP_SUBSTR("Website", '[^https://] ')

Some of the results are working but others are not, for instance I am expecting cornstalk.com and penny.co but I am not receiving those values:

ivalore.com.br
corn
enny.co

Any help would be appreciated.

CodePudding user response:

You can use

SELECT REGEXP_SUBSTR("Website", '^(https?://)?(.*)', 1, 1, 'e', 2)

Details:

  • ^ - start of string
  • (https?://)? - an optional Group 1: http:// or https://
  • (.*) - Group 2: the rest of the string.

The last argument, together with e last but one argument, returns the Group 2 value.

However, REGEXP_REPLACE might be better here:

SELECT REGEXP_REPLACE("Website", '^https?://', '')

That is, just remove the http:// or https:// from the start of a string.

  • Related