Home > database >  Regex is not working as expected - determining if a path is correct
Regex is not working as expected - determining if a path is correct

Time:10-06

I am having a hard time finishing a regex to catch a correct pattern in my files.

This is my PHP Code that I let run:

$imagesToAdd = [];
foreach (glob($imagePath.'/'.$groupCode.'*.{jpg,png}', GLOB_BRACE) as $filepath) {
    $match = (int) preg_match('#'.$imagePath.'/'.$groupCode.' ([^\D]|[-_\d]?) .(png|jpg)#', $filepath);
    print $filepath . PHP_EOL;
    print $imagePath.'/'.$groupCode.' (|[\D]|[-_\d]) .(png|jpg)' . PHP_EOL;
    var_dump($match);
    if ($match !== 0) {
        $imagesToAdd[] = $filepath;
    }
}

The Regex here is '#'.$imagePath.'/'.$groupCode.' ([^\D]|[-_\d]?) .(png|jpg)#'. Lets use some examples to make clear what my objective is and why its still failing.

Lets say the Regex resolves to this:

\/home\/shop\/test\/release\/20211005131411\/pub\/media\/productImages\/29192-7 ([^\D]|[-_\d]?) .(png|jpg)

it will correctly fetch the provided path as expected. But now comes my problem.

My $groupCode can be either 1, 1-1 or 1123123123-1 or simply only 1123123123. In any case, 1 should not match a provided path that ends onto 123123123-1.png or 12111111.jpg.

So, if the $groupCode is 1, it should only match 1.jpg or 1.png or 1-1.png or 1-9999.jpg.

If the $groupCode is 123-1 it should only match 123-1.jpg or 123-1.png or 123-1-anyothernumber.(png|jpg). What do I have to do in order to get the Regex straight here?

My current approach does not work completely and I am running out of ideas.

CodePudding user response:

First replace the glob() with some static data to get a reproducible example. It is really difficult for us to debug something if we do not have all parts.

$data = [
    '/base-path/42.png',
    '/base-path/42-1.png',
    '/base-path/42.PNG',
    '/base-path/21.png'
];
$imagePath = '/base-path';
$groupCode = '42';

Some things to look out for:

  • If you put a variable value into a regular expression, you will need to escape the special characters that might be in it - use preg_quote().
  • Validating patterns need to be anchored, otherwise here can be content before and after - use ^ and $.

If I understand correctly you're trying to match:

{path}/{code}.{image-extension} and {path}/{code}-{digits}.{image-extension}

  • Digit: \d
  • Digits (at least one): \d
  • Prefixed with -: -\d
  • Optional: (-\d )?
  • With image file extension: (-\d )?\.(jpg|png)
  • Case insensitive extensions: (-\d )?\.((?i)jpg|png)

Put together:

$pattern = '(^'.preg_quote($imagePath.'/'.$groupCode).'(-\d )?\\.((?i)png|jpg)$)';
var_dump($pattern);

$imagesToAdd = [];
foreach ($data as  $file) {
    echo 'Validating: ', $file, PHP_EOL;
    $found = preg_match($pattern, $file);
    echo $found ? ' - HIT' : ' - MISS', PHP_EOL;
    if ($found) {
        $imagesToAdd[] = $file;
    }
}

var_dump($imagesToAdd);

Output:

string(40) "(^/base\-path/42(-\d )?\.((?i)png|jpg)$)"
Validating: /base-path/42.png
 - HIT
Validating: /base-path/42-1.png
 - HIT
Validating: /base-path/42.PNG
 - HIT
Validating: /base-path/21.png
 - MISS
array(3) {
  [0]=>
  string(17) "/base-path/42.png"
  [1]=>
  string(19) "/base-path/42-1.png"
  [2]=>
  string(17) "/base-path/42.PNG"
}
  • Related