I'm trying to match some sort of amount, here are all possibilities:
$5.6 million
$4,1 million
$8,1M
$6.3M
$333,333
$2 million
$5 million
I have already this regex:
\$\d{1,3}(?:,\d{3})*(?:\s (?:thousand|[mb]illion|[MB]illion)|[M])?
See online demo.
But I'm not able to match those ones:
$5.6 million
$4,1 million
$8,1M
$6.3M
Any help would be appreciated.
CodePudding user response:
You can use
(?i)\$\d (?:[.,]\d )*(?:\s (?:thousand|[mb]illion)|m)?
If you need to make sure you do not match m
that is part of another word:
(?i)\$\d (?:[.,]\d )*(?:\s (?:thousand|[mb]illion)|m)?\b
See the regex demo. Details:
(?i)
- case insensitive option\$
- a$
char\d
- one or more digits(?:[.,]\d )*
- zero or more repetitions of.
or,
and then one or more digits(?:\s (?:thousand|[mb]illion)|m)?
- an optional occurrence of\s (?:thousand|[mb]illion)
- one or more whitespaces and thenthousand
,million
orbillion
|
- orm
- anm
char
\b
- a word boundary.
CodePudding user response:
Let's look at your regular expression:
\$\d{1,3}(?:,\d{3})*(?:\s (?:thousand|[mb]illion|[MB]illion)|[M])?
\$\d{1,3}
is fine. What follows? One way to answer that is to consider the following three possibilities.
The string to be matched ends ' million'
This string (which begins with a space, in case you missed that) is preceded by an empty string or a single digit preceded by a comma or period:
(?:[,.]\d)? million
Evidently, "million" can be "thousand" or "billion", and the first in last might be capitalized, so we change the expression to
(?:[,.]\d)? (?:[MmBb]illion|thousand)
One potential problem is that this matches '$5.6 millionaire'
. We can avoid that problem by tacking on a word boundary preventing the match to be followed by a word character:
(?:[,.]\d)? (?:[MmBb]illion|thousand)\b
The string ends 'M'
In this case the 'M'
must be preceded by a single digit preceded by a comma or period:
[,.]\dM\b
You could accept 'B'
as well by changing M
to [MB]
.
The string ends with three digits preceded by a comma
Here we need
,\d{3}\b
Here the word boundary avoids matching, for example, $333,3333'
. It will not match, however, '$333,333,333'
or '$333,333,333,333'
. If we want to match those we could change the expression to
(?:,\d{3}) \b
or to match '$333'
as well, change it to
(?:,\d{3})*\b
Construct the alternation
We therefore can use the following regular expression.
\$\d{1,3}(?:(?:[,.]\d)? (?:[MmBb]illion|thousand)\b|[,.]\dMb|,\d{3}b)
Factoring out the end-of-string anchor we obtain
\$\d{1,3}(?:(?:[,.]\d)? (?:[MmBb]illion|thousand)|[,.]\dM|,\d{3})b