Home > Enterprise >  Regular expression for highlighting numbers between words
Regular expression for highlighting numbers between words

Time:03-30

Site users enter numbers in different ways, example:

from 8 000 packs
432534534
from 344454 packs
45054 packs
04 555
434654
54 564 packs

I am looking for a regular expression with which I could highlight words before digits (if there are any), digits in any format and words after (if there are any). It is advisable to exclude spaces.

Now I have such a design, but it does not work correctly.

(^[0-9|a-zA-Z].*?)\s([0-9].*?)\s([a-zA-Z]*$)

The main purpose of this is to put the strings in order, bring them to the same form, format them in PHP digit format, etc.

As a result, I need to get the text before the digits, the digits themselves and the text after them into the variables separately.

$before = 'from';
$num    = '8000';
$after  = 'packs';

Thank you for any help in this matter)

CodePudding user response:

I think you may try this:

^(\D )?([\d \t] )(\D )?$
  1. group 1: optional(?) group that will contain anything but digit
  2. group 2: mandatory group that will contain only digits and white space character like space and tab
  3. group 3: optional(?) group that will contain anything but digit

Demo

Source (run)

$re = '/^(\D )?([\d \t] )(\D )?$/m';
$str = 'from 8 000 packs
432534534
from 344454 packs
45054 packs
04 555
434654
54 564 packs
';

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);

foreach ($matches as $matchgroup) 
{
    echo "before: ".$matchgroup[1]."\n";
    echo "number:".preg_replace('/\D/m','',$matchgroup[2])."\n";
    echo "after:".$matchgroup[3]."";
    echo "\n\n\n";
    
}

CodePudding user response:

I corrected your regex and added groups, the regex looks like this:

^(?<before>[a-zA-Z] )?\s?(?<number>[0-9].*?)\s?(?<after>[a-zA-Z] )?$`

Test regex here: https://regex101.com/r/QLEC9g/2

By using groups you can easily separate the words and numbers, and handle them any way you want.

CodePudding user response:

Your pattern does not match because there are 4 required parts that all expect 1 character to be present:

(^[0-9|a-zA-Z].*?)\s([0-9].*?)\s([a-zA-Z]*$)
  ^^^^^^^^^^^^    ^^ ^^^^^    ^^

The other thing to note is that the first character class [0-9|a-zA-Z] can also match digits (you can omit the | as it would match a literal pipe char)


If you would allow all other chars than digits on the left and right, and there should be at least a single digit present, you can use a negated character class [^\d\r\n]* optionally matching any character except a digit or a newline:

^([^\d\r\n]*)\h*(\d (?:\h \d )*)\h*([^\d\r\n]*)$
  • ^ Start of string
  • ([^\d\r\n]*) Capture group 1, match any char except a digit or a newline
  • \h* Match optional horizontal whitespace chars
  • (\d (?:\h \d )*) Capture group 2, match 1 digits and optionally repeat matching spaces and 1 digits
  • \h* Match optional horizontal whitespace chars
  • ([^\d\r\n]*) Capture group 3, match any char except a digit or a newline
  • $ End of string

See a regex demo and a PHP demo.

For example

$re = '/^([^\d\r\n]*)\h*(\d (?:\h \d )*)\h*([^\d\r\n]*)$/m';
$str = 'from 8 000 packs
test from 8 000 packs test
432534534
from 344454 packs
45054 packs
04 555
434654
54 564 packs';

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);

foreach($matches as $match) {
    list(,$before, $num, $after) = $match;
    echo sprintf(
        "before: %s\nnum:%s\nafter:%s\n--------------------\n",
        $before, preg_replace("/\h /", "", $num), $after
    );
}

Output

before: from 
num:8000
after:packs
--------------------
before: test from 
num:8000
after:packs test
--------------------
before: 
num:432534534
after:
--------------------
before: from 
num:344454
after:packs
--------------------
before: 
num:45054
after:packs
--------------------
before: 
num:04555
after:
--------------------
before: 
num:434654
after:
--------------------
before: 
num:54564
after:packs
--------------------

If there should be at least a single digit present, and the only allowed characters are a-z for the word(s), you can use a case insensitive pattern:

(?i)^((?:[a-z] (?:\h [a-z] )*)?)\h*(\d (?:\h \d )*)\h*((?:[a-z] (?:\h [a-z] )*)?)?$

See another regex demo and a php demo.

  • Related