Home > Software engineering >  Fetch comma separated numbers by regex
Fetch comma separated numbers by regex

Time:12-02

I need to fetch comma separated integers from a string of specific format using Ruby String#match method:

'text PaymentID: 12345'.match(PATTERN)[1..-1]          # expected result: ['12345']
'text Payment ID: 12345'.match(PATTERN)[1..-1]         # expected result: ['12345']
'text Payment id 12345'.match(PATTERN)[1..-1]          # expected result: ['12345']
'text paymentid:12345'.match(PATTERN)[1..-1]           # expected result: ['12345']
'text payment id: 12345'.match(PATTERN)[1..-1]         # expected result: ['12345']
'text payment ID: 111,999'.match(PATTERN)[1..-1]       # expected result: ['111', '999']
'text payment ID: 111, 222, 333'.match(PATTERN)[1..-1] # expected result: ['111', '222', '333']

So all spaces and ':' symbol are optional, the pattern should be case insensitive, text before payment can contain any characters. My last variant was not good enough:

PATTERN = /payment[\s]?id[:]?[\s]?(\d )(?:[,]?[\s]?(\d )) /i

> 'text Payment id: 12345'.match(PATTERN)[1..-1]
=> ["1234", "5"]
> 'text Payment id: 12345, 333, 91872389'.match(PATTERN)[1..-1]
=> ["12345", "91872389"]

Any ideas on how to achieve this? Thanks in advance.

CodePudding user response:

You can use

text.scan(/(?:\G(?!\A)\s*,|payment\s?id:?)\s*\K\d /i)

The regex matches

  • (?:\G(?!\A)\s*,|payment\s?id:?) - the end of the previous successful match and then zero or more whitespaces and a comma or payment, an optional whitespace, id and an optional colon
  • \s* - zero or more whitespaces
  • \K removes what has just been consumed from the match
  • \d - one or more digits.

CodePudding user response:

You can't repeat a capture group since the last occurrence will overwrite the previous. What you can do is to use a \G based pattern that ensures the contiguity between successive matches with the scan method:

PATTERN = /(?:(?!\A)\G\s*,|payment\s*id\s*:?)\s*(\d )/i

'text Payment id: 12345, 333, 91872389'.scan(PATTERN).flatten

In short the second branch payment\s*id\s*:? have to succeed first, to allow the first branch (?!\A)\G\s* to succeed for the next matches.

  • Related