Home > Software design >  Regex to match dollar amount with uppercase letter or word
Regex to match dollar amount with uppercase letter or word

Time:05-01

I'm trying to match some sort of amount, here are all possibilities:

$5.6 million
$4,1 million
$8,1M
$6.3M
$333,333
$2 million
$5 million

I have already this regex:

\$\d{1,3}(?:,\d{3})*(?:\s (?:thousand|[mb]illion|[MB]illion)|[M])?

See online demo.

But I'm not able to match those ones:

$5.6 million
$4,1 million
$8,1M
$6.3M

Any help would be appreciated.

CodePudding user response:

You can use

(?i)\$\d (?:[.,]\d )*(?:\s (?:thousand|[mb]illion)|m)?

If you need to make sure you do not match m that is part of another word:

(?i)\$\d (?:[.,]\d )*(?:\s (?:thousand|[mb]illion)|m)?\b

See the regex demo. Details:

  • (?i) - case insensitive option
  • \$ - a $ char
  • \d - one or more digits
  • (?:[.,]\d )* - zero or more repetitions of . or , and then one or more digits
  • (?:\s (?:thousand|[mb]illion)|m)? - an optional occurrence of
    • \s (?:thousand|[mb]illion) - one or more whitespaces and then thousand, million or billion
    • | - or
    • m - an m char
  • \b - a word boundary.

CodePudding user response:

Let's look at your regular expression:

\$\d{1,3}(?:,\d{3})*(?:\s (?:thousand|[mb]illion|[MB]illion)|[M])?

\$\d{1,3} is fine. What follows? One way to answer that is to consider the following three possibilities.

The string to be matched ends ' million'

This string (which begins with a space, in case you missed that) is preceded by an empty string or a single digit preceded by a comma or period:

(?:[,.]\d)? million

Evidently, "million" can be "thousand" or "billion", and the first in last might be capitalized, so we change the expression to

(?:[,.]\d)? (?:[MmBb]illion|thousand)

One potential problem is that this matches '$5.6 millionaire'. We can avoid that problem by tacking on a word boundary preventing the match to be followed by a word character:

(?:[,.]\d)? (?:[MmBb]illion|thousand)\b

The string ends 'M'

In this case the 'M' must be preceded by a single digit preceded by a comma or period:

[,.]\dM\b

You could accept 'B' as well by changing M to [MB].

The string ends with three digits preceded by a comma

Here we need

,\d{3}\b

Here the word boundary avoids matching, for example, $333,3333'. It will not match, however, '$333,333,333' or '$333,333,333,333'. If we want to match those we could change the expression to

(?:,\d{3}) \b

or to match '$333' as well, change it to

(?:,\d{3})*\b

Construct the alternation

We therefore can use the following regular expression.

\$\d{1,3}(?:(?:[,.]\d)? (?:[MmBb]illion|thousand)\b|[,.]\dMb|,\d{3}b)

Factoring out the end-of-string anchor we obtain

\$\d{1,3}(?:(?:[,.]\d)? (?:[MmBb]illion|thousand)|[,.]\dM|,\d{3})b

Demo

  • Related