Home > Software design >  How to parse an inconsistent string in Snowflake?
How to parse an inconsistent string in Snowflake?

Time:09-16

So I have a column of data in a table like this

ID    site
001   ://site.com/love/Walter-White
002   ://site.com/love/Jesse-Pinkman
003   ://site.com/
004   ://site.com/love/Hank-Schraeder?utm_source=InstagramO&utm_medium=referral&utm_campaign=Organic&utm_content=Schraeder-Brau
005   ://site.com/love/Jesse-Pinkman
006   ://site.com/contact

I want to create a new identifier which tells us if the page has the /love/ pattern. If they do, then I want the person's name, which is always the format of a string, seperated by - for full names..

The issue that I am having is when there's more data in the string, which we see when data is followed by the ?. I just want what's inbetween the /love/ and ? but the ? is not always there...

Final result:

ID   love_site?  name
001  YES         Walter-White
002  YES         Jesse-Pinkman
003  NO          
004  YES         Hank-Schraeder
005  YES         Jesse-Pinkman
006  NO

CodePudding user response:

You can use regexp_substr():

select t.*,
       (case when site like '%/love/%' then 'Y' else 'N' end) as love_site,
       regexp_substr(site, '/love/([a-zA-Z]*-[a-zA-Z]*)', 1, 1, 'e')
from t;
  • Related