How to get specific user code from address label-CodePudding

I am trying to get user code from package label which contains address and 6 digit and sometimes 2 prefix with 6 digit. First of all i am getting label image then with help of aws textract, i take text. But inside of text there can be sometimes 6 digit another code. tried regex (\s\d{6}\s)|((\.)\d{6}\s)|(\s[a-zA-Z]{2}\d{6}\s) with preg_match_all. Is there any solution which can help me to find that code? 1 Note there is address which is always static and may be there is any function which will search nearby that address?

Example of label. Searched for --> <--:

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt --> 913847 <-- ut labore et dolore magna aliqua.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua --> 913847 <--.
--> TK913847 <-- Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

CodePudding user response：

Use

preg_match_all('/address\s \K(?:[A-Z]{2})?\d{6}\b/i', $string, $matches)

Note: Not preg_match. Use preg_match_all to get all matches from your text.

See regex proof.

EXPLANATION

--------------------------------------------------------------------------------
  address                  'address'
--------------------------------------------------------------------------------
  \s                       whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  \K                       match reset operator
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (optional
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    [A-Z]{2}                 any character of: 'A' to 'Z', 'a' to 'z' (2 times)
--------------------------------------------------------------------------------
  )?                       end of grouping
--------------------------------------------------------------------------------
  \d{6}                    digits (0-9) (6 times)
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char

CodePudding user response：

Looks like this would do all you need:

preg_match_all('#\b(([\w]{2})?[\d]{6})\b#ms', $input, $matches);

Matches the following code examples:

AA123456
bb123456
123456

But won't match if part of a term thanks to boundaries such as:

lorem123456
code123456aa