Home > Net >  Get text before certain words using regex in redshift
Get text before certain words using regex in redshift

Time:02-02

I want to get text before certain part of the string. In fact, it is name of the company in each case which appears at the start of the string (it can be multiple words). Space characters may vary. Some expected outputs are:

Text                                  output
Company-1 Gift Card $100              Company-1 
Company 2 Digital Gift Card $100      Company 2
Company2 Digital Gift Card $50        Company2
Company3 Digital Gift Card $100       Company3
Company4 Gift Cards $100 - Digital    Company4
Company2 Digital Gift Card $250       Company2
Company1 Digital Gift Card $50        Company1
Company4 Digital Gift Card $100       Company4
Company_5 Digital Gift Cards $100     Company_5
Company1 Digital Gift Card $250       Company1
Company 6 Gift Card $100 - Physical   Company 6
Company2 Gift Card $100 - Digital     Company2

I have done the following, but it seems that non-capturing mode of regular expression (?:) doesn't work in redshift. So, don't know how to solve this.

(^.*)(?:[\\s] (?:Digital)?[\\s].*)

Thanks for your help.

CodePudding user response:

I still hope someone can find an answer using regex. I found a non regex solution using split_part for now:

CASE WHEN LOWER(text) LIKE '%digital' then split_part(LOWER(text), 'gift',1)
    WHEN LOWER(text) LIKE '%digital%' then split_part(LOWER(text), 'digital',1) 
    ELSE split_part(LOWER(text), 'gift',1) END AS company_name

CodePudding user response:

I'm no heavy hitter with SQL nor am I familiar with Redshift, but reading some documentation makes me believe that redshift's regex functions actually do support non-capture groups a.o. different to the POSIX ERE standard.

Use the 5th parameter in the REGEX_REPLACE() function. We simply write 'p' to interpret the pattern with Perl Compatible Regular Expression (PCRE) dialect.

In your case, this should then work:

SELECT regexp_replace(text, '(\\s Digital)?\\s Gift Card.*', '', 1, 'p') AS company_name
FROM YourTable;

Matching left-to-right:

  • (\\s Digital)? - Optionally match: 1 whitespace characters before a literal 'Digital';
  • \\s Gift Card - Match: 1 whitespace characters before literally 'Gift Card';
  • .* - Match 0 characters other than newline (remainder of string).

Replace matched substring with an empty string. See an online regex101 demo.

  • Related