I am trying to get a name from a list of websites and need only the first string or word to be more precise. I have found a way in Python but I am interested in doing it with PostgreSQL
For example I would need happy from https://www.happy.com
.
I used following code but I think I am going in a wrong direction:
WITH test AS (SELECT regexp_replace('https://www.happy.com','(https://www.|https://|http://)',''))
SELECT SPLIT_PART(test,'.',0)
CodePudding user response:
You can use the following REGEXP_REPLACE
:
SELECT REGEXP_REPLACE('https://www.happy.com',
'^(?:(?:ht|f)tps?://(?:www\.)?|www\.)(.*)\.[^.] $',
'\1') AS Result;
See the regex demo.
Details:
^
- start of string(?:(?:ht|f)tps?://(?:www\.)?|www\.)
-http://www.
,https://www.
,ftp://www.
,ftps://www.
,https://
,http://
,ftps://
,ftp://
or justwww.
(.*)
- Group 1 (\1
): any zero or more chars as many as possible\.
- a.
char[^.]
- one or more chars other than a.
char$
- end of string.
CodePudding user response:
Thank you very much Wiktor. I really appreciate it not just because you gave me the syntax but because I am fairly new to REGEX and your detailed explanation helped understand better filtering.