Home > Software design >  How to find variable pattern in MySql with Regex?
How to find variable pattern in MySql with Regex?

Time:10-07

I am trying to pull a product code from a long set of string formatted like a URL address. The pattern is always 3 letters followed by 3 or 4 numbers (ex. ???### or ???####). I have tried using REGEXP and LIKE syntax, but my results are off for both/I am not sure which operators to use.

The first select statement is close to trimming the URL to show just the code, but oftentimes will show a random string of numbers it may find in the URL string.

The second select statement is more rudimentary, but I am unsure which operators to use.

Which would be the quickest solution?

SELECT columnName, SUBSTR(columnName, LOCATE(columnName REGEXP "[^=\-][a-zA-Z]{3}[\d]{3,4}", columnName), LENGTH(columnName) - LOCATE(columnName REGEXP "[^=\-][a-zA-Z]{3}[\d]{3,4}", REVERSE(columnName))) AS extractedData FROM tableName

SELECT columnName FROM tableName WHERE columnName LIKE '%___###%' OR columnName LIKE '%___####%' 
-- Will take a substring of this result as well

Example Data:

randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-xyz123&hello_world=us&etc_etc

In this case, the desired string is "xyz123" and the location of said pattern is variable based on each entry.

EDIT

SELECT column, LOCATE(column REGEXP "([a-zA-Z]{3}[0-9]{3,4}$)", column), SUBSTR(column, LOCATE(column REGEXP "([a-zA-Z]{3}[0-9]{3,4}$)", column), LENGTH(column) - LOCATE(column REGEXP "^.*[a-zA-Z]{3}[0-9]{3,4}", REVERSE(column))) AS extractData From mainTable

This expression is still not grabbing the right data, but I feel like it may get me closer.

CodePudding user response:

I suggest using

REGEXP_SUBSTR(column, '(?<=[&?]random_code=[^&#]{0,256}-)[a-zA-Z]{3}[0-9]{3,4}(?![^&#])')

Details:

  • (?<=[&?]random_code=[^&#]{0,256}-) - immediately on the left, there must be & or &, random_code=, and then zero to 256 chars other than & and # followed with a - char
  • [a-zA-Z]{3} - three ASCII letters
  • [0-9]{3,4} - three to four ASCII digits
  • (?![^&#]) - that are followed either with &, # or end of string.

See the enter image description here

CodePudding user response:

I'd make use of capture groups: (?<=[=\-\\])([a-zA-Z]{3}[\d]{3,4})(?=[&]) I assume with [^=\-] you wanted to capture string with "-","\" or "=" in front but not include those chars in the result. To do that use "positive lookbehind" (?<=. I also added a lookahead (?= for "&". If you'd like to fidget more with regex I recommend RegExr

  • Related