I need to fetch comma separated integers from a string of specific format using Ruby String#match
method:
'text PaymentID: 12345'.match(PATTERN)[1..-1] # expected result: ['12345']
'text Payment ID: 12345'.match(PATTERN)[1..-1] # expected result: ['12345']
'text Payment id 12345'.match(PATTERN)[1..-1] # expected result: ['12345']
'text paymentid:12345'.match(PATTERN)[1..-1] # expected result: ['12345']
'text payment id: 12345'.match(PATTERN)[1..-1] # expected result: ['12345']
'text payment ID: 111,999'.match(PATTERN)[1..-1] # expected result: ['111', '999']
'text payment ID: 111, 222, 333'.match(PATTERN)[1..-1] # expected result: ['111', '222', '333']
So all spaces and ':' symbol are optional, the pattern should be case insensitive, text before payment
can contain any characters.
My last variant was not good enough:
PATTERN = /payment[\s]?id[:]?[\s]?(\d )(?:[,]?[\s]?(\d )) /i
> 'text Payment id: 12345'.match(PATTERN)[1..-1]
=> ["1234", "5"]
> 'text Payment id: 12345, 333, 91872389'.match(PATTERN)[1..-1]
=> ["12345", "91872389"]
Any ideas on how to achieve this? Thanks in advance.
CodePudding user response:
You can use
text.scan(/(?:\G(?!\A)\s*,|payment\s?id:?)\s*\K\d /i)
The regex matches
(?:\G(?!\A)\s*,|payment\s?id:?)
- the end of the previous successful match and then zero or more whitespaces and a comma orpayment
, an optional whitespace,id
and an optional colon\s*
- zero or more whitespaces\K
removes what has just been consumed from the match\d
- one or more digits.
CodePudding user response:
You can't repeat a capture group since the last occurrence will overwrite the previous. What you can do is to use a \G
based pattern that ensures the contiguity between successive matches with the scan method:
PATTERN = /(?:(?!\A)\G\s*,|payment\s*id\s*:?)\s*(\d )/i
'text Payment id: 12345, 333, 91872389'.scan(PATTERN).flatten
In short the second branch payment\s*id\s*:?
have to succeed first, to allow the first branch (?!\A)\G\s*
to succeed for the next matches.