Home > Back-end >  Getting first string from a web address
Getting first string from a web address

Time:06-25

I am trying to get a name from a list of websites and need only the first string or word to be more precise. I have found a way in Python but I am interested in doing it with PostgreSQL

For example I would need happy from https://www.happy.com.

I used following code but I think I am going in a wrong direction:

WITH test AS (SELECT regexp_replace('https://www.happy.com','(https://www.|https://|http://)',''))
SELECT SPLIT_PART(test,'.',0)   

CodePudding user response:

You can use the following REGEXP_REPLACE:

SELECT REGEXP_REPLACE('https://www.happy.com',
                      '^(?:(?:ht|f)tps?://(?:www\.)?|www\.)(.*)\.[^.] $',
                      '\1') AS Result;

See the regex demo.

Details:

  • ^ - start of string
  • (?:(?:ht|f)tps?://(?:www\.)?|www\.) - http://www., https://www., ftp://www., ftps://www., https://, http://, ftps://, ftp:// or just www.
  • (.*) - Group 1 (\1): any zero or more chars as many as possible
  • \. - a . char
  • [^.] - one or more chars other than a . char
  • $ - end of string.

CodePudding user response:

Thank you very much Wiktor. I really appreciate it not just because you gave me the syntax but because I am fairly new to REGEX and your detailed explanation helped understand better filtering.

  • Related