Site users enter numbers in different ways, example:
from 8 000 packs
432534534
from 344454 packs
45054 packs
04 555
434654
54 564 packs
I am looking for a regular expression with which I could highlight words before digits (if there are any), digits in any format and words after (if there are any). It is advisable to exclude spaces.
Now I have such a design, but it does not work correctly.
(^[0-9|a-zA-Z].*?)\s([0-9].*?)\s([a-zA-Z]*$)
The main purpose of this is to put the strings in order, bring them to the same form, format them in PHP digit format, etc.
As a result, I need to get the text before the digits, the digits themselves and the text after them into the variables separately.
$before = 'from';
$num = '8000';
$after = 'packs';
Thank you for any help in this matter)
CodePudding user response:
I think you may try this:
^(\D )?([\d \t] )(\D )?$
- group 1: optional(?) group that will contain anything but digit
- group 2: mandatory group that will contain only digits and white space character like space and tab
- group 3: optional(?) group that will contain anything but digit
Source (run)
$re = '/^(\D )?([\d \t] )(\D )?$/m';
$str = 'from 8 000 packs
432534534
from 344454 packs
45054 packs
04 555
434654
54 564 packs
';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
foreach ($matches as $matchgroup)
{
echo "before: ".$matchgroup[1]."\n";
echo "number:".preg_replace('/\D/m','',$matchgroup[2])."\n";
echo "after:".$matchgroup[3]."";
echo "\n\n\n";
}
CodePudding user response:
I corrected your regex and added groups, the regex looks like this:
^(?<before>[a-zA-Z] )?\s?(?<number>[0-9].*?)\s?(?<after>[a-zA-Z] )?$`
Test regex here: https://regex101.com/r/QLEC9g/2
By using groups you can easily separate the words and numbers, and handle them any way you want.
CodePudding user response:
Your pattern does not match because there are 4 required parts that all expect 1 character to be present:
(^[0-9|a-zA-Z].*?)\s([0-9].*?)\s([a-zA-Z]*$)
^^^^^^^^^^^^ ^^ ^^^^^ ^^
The other thing to note is that the first character class [0-9|a-zA-Z]
can also match digits (you can omit the |
as it would match a literal pipe char)
If you would allow all other chars than digits on the left and right, and there should be at least a single digit present, you can use a negated character class [^\d\r\n]*
optionally matching any character except a digit or a newline:
^([^\d\r\n]*)\h*(\d (?:\h \d )*)\h*([^\d\r\n]*)$
^
Start of string([^\d\r\n]*)
Capture group 1, match any char except a digit or a newline\h*
Match optional horizontal whitespace chars(\d (?:\h \d )*)
Capture group 2, match 1 digits and optionally repeat matching spaces and 1 digits\h*
Match optional horizontal whitespace chars([^\d\r\n]*)
Capture group 3, match any char except a digit or a newline$
End of string
See a regex demo and a PHP demo.
For example
$re = '/^([^\d\r\n]*)\h*(\d (?:\h \d )*)\h*([^\d\r\n]*)$/m';
$str = 'from 8 000 packs
test from 8 000 packs test
432534534
from 344454 packs
45054 packs
04 555
434654
54 564 packs';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
foreach($matches as $match) {
list(,$before, $num, $after) = $match;
echo sprintf(
"before: %s\nnum:%s\nafter:%s\n--------------------\n",
$before, preg_replace("/\h /", "", $num), $after
);
}
Output
before: from
num:8000
after:packs
--------------------
before: test from
num:8000
after:packs test
--------------------
before:
num:432534534
after:
--------------------
before: from
num:344454
after:packs
--------------------
before:
num:45054
after:packs
--------------------
before:
num:04555
after:
--------------------
before:
num:434654
after:
--------------------
before:
num:54564
after:packs
--------------------
If there should be at least a single digit present, and the only allowed characters are a-z for the word(s), you can use a case insensitive pattern:
(?i)^((?:[a-z] (?:\h [a-z] )*)?)\h*(\d (?:\h \d )*)\h*((?:[a-z] (?:\h [a-z] )*)?)?$
See another regex demo and a php demo.