Extract Substring from a string using SQL-CodePudding

Body
Beta M yy-Tz
Beta M zz-Ox
Beta W yy-Tz
Tiger W Dash
Tiger M Dash
Tiger White Stone W Pearl
Tiger White Stone M Pearl

I have a column 'Body' in a snowflake table xyz which contains the products being ordered. The 'Body' field is composed of three parts.
1st part: Product Name (Ex: Beta, Tiger, Tiger White Stone)

2nd part: Gender M or W and

3rd part is internal product naming (Ex: yy | Tz, zz | Ox, Dash and Pearl).

I want to extract the M and W in the separate column named Gender to know how many men and women customers I have.

Expected output:

Gender
M
M
W
W
M
W
M

I tried to use the split function but I feel it is not an efficient way. Is there a better way to get this implemented?

CodePudding user response：

Because the "parts" of the string can contain spaces then using spaces to find the 2nd part is probably not reliable. I'm going to assume that the patterns you are looking for are ' M ' and ' W ' but obviously if these patterns can exist in the first or third part this won't work.

Try this:

CASE WHEN POSITION(' M ', BODY) > 0 THEN 'M'
     WHEN POSITION(' W ', BODY) > 0 THEN 'W'
     ELSE 'X' END

CodePudding user response：

Yet another option is using the Snowflake REGEXP_SUBSTR function, with a regex expression that matches the single character right before your 3rd defined group:

SELECT REGEXP_SUBSTR(Body, ' ([A-Z]) [\\w\\-] $', 1, 1, 'e') AS Gender
FROM tab

The regex looks for:

([A-Z]): space uppercase character (Group 1) space
[\\w\\-] : any combination of alphanumerical characters and dashes
$: end of string

The three parameters 1, 1, 'e' stand for:

1, : start from the first char
, 1,: get the first occurrence
'e': extract the (first) group

Does it solve your problem?