Home > Back-end >  Extract a number from string using regex with various formats (MySQL)
Extract a number from string using regex with various formats (MySQL)

Time:10-22

Good morning all,

Long time lurker, first time poster, as I cannot for the life of me figure this out.

I have a mySQL data table which stores metadata for client transactions. I am writing a query to extract a number out of the metadata column, which is essentially a JSON stored as a string.

I am trying to find 'clients' and extract the first number after clients. The data can be stored in several different ways; see the examples below.

  • ..."type\":\"temp\",\"typeOther\":\"\",\"clients\":\"2\",\"hours\":\"5\",\...
  • ..."id\":31457,\"clients\":2,\"cancel\":false\...

I've tried the following Regexp

  • (?<=clients.....)[0-9]
  • (?<=clients...)[0-9]*(?=[[^:digit:]])

And I've tried the following json_extract, but it returned a null value.

  • json_extract(rd.meta, '$.clients')

The regexp functions do work, but the first one only works on the first example, while the second only works on the second example.

What regexp should I use such that it's dynamic and will pull the number nested between two non-word char sets after 'clients'?

CodePudding user response:

clients.*?([0-9] )
^^^^^^^            -- exact match
       ^^^         -- non-greedy string of anything
          ^      ^ -- capture
           ^^^^^^  -- string of 1 or more digits (greedy)

CodePudding user response:

I did this test on MySQL 8.0.29, but it should work on MySQL 5.x too:

mysql> set @s1 = '..."type\":\"temp\",\"typeOther\":\"\",\"clients\":\"2\",\"hours\":\"5\",\...';

mysql> set @s2 = '..."id\\":31457,\\"clients\\":2,\\"cancel\\":false\\...';

mysql> select trim(leading '\\"' from substring_index(@s1, '\\"clients\\":', -1)) as clients;
 -------------------------- 
| clients                  |
 -------------------------- 
| 2\",\"hours\":\"5\",\... |
 -------------------------- 

mysql> select trim(leading '\\"' from substring_index(@s2, '\\"clients\\":', -1)) as clients;
 ------------------------ 
| clients                |
 ------------------------ 
| 2,\"cancel\":false\... |
 ------------------------ 

Then cast the result as an integer to get rid of the non-numeric part following the number.

mysql> select cast(trim(leading '\\"' from substring_index(@s1, '\\"clients\\":', -1)) as unsigned) as clients;
 --------- 
| clients |
 --------- 
|       2 |
 --------- 

mysql> select cast(trim(leading '\\"' from substring_index(@s2, '\\"clients\\":', -1)) as unsigned) as clients;
 --------- 
| clients |
 --------- 
|       2 |
 --------- 
  • Related