I am trying to get user code from package label which contains address and 6 digit and sometimes 2 prefix with 6 digit. First of all i am getting label image then with help of aws textract, i take text. But inside of text there can be sometimes 6 digit another code. tried regex (\s\d{6}\s)|((\.)\d{6}\s)|(\s[a-zA-Z]{2}\d{6}\s)
with preg_match_all
. Is there any solution which can help me to find that code? 1 Note there is address which is always static and may be there is any function which will search nearby that address?
Example of label. Searched for --> <--:
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt --> 913847 <-- ut labore et dolore magna aliqua.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua --> 913847 <--.
--> TK913847 <-- Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
CodePudding user response:
Use
preg_match_all('/address\s \K(?:[A-Z]{2})?\d{6}\b/i', $string, $matches)
Note: Not preg_match
. Use preg_match_all
to get all matches from your text.
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
address 'address'
--------------------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
\K match reset operator
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
[A-Z]{2} any character of: 'A' to 'Z', 'a' to 'z' (2 times)
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
\d{6} digits (0-9) (6 times)
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
CodePudding user response:
Looks like this would do all you need:
preg_match_all('#\b(([\w]{2})?[\d]{6})\b#ms', $input, $matches);
Matches the following code examples:
- AA123456
- bb123456
- 123456
But won't match if part of a term thanks to boundaries such as:
- lorem123456
- code123456aa