I have got many topics on extracting all urls from a string and detecting urls with specific pattern. But not both. Sorry I am a bit rough in regex. Can someone please help.
Here is what I want:
$str = <<<EOF
This string is valid - http://example.com/products/1
This string is not valid - http://example.com/order/1
EOF;
Basically I want to extract all urls inside the $str
variable which has a patter with /products/
I tried this for the url extraction - /\b(?:(?:https?|ftp):\/\/|www\.)[-a-z0-9 &@#\/%?=~_|!:,.;]*[-a-z0-9 &@#\/%=~_|]/i
but along with this I only want those having that pattern and not the others.
CodePudding user response:
You can repeat all the allowed characters before and after matching /products/
using the same optional character class. As the character class is quite long, you could shorten the notation by wrapping it in a capture group and recurse the first subpattern as (?1)
Note that you don't have to escape the forward slash using a different separator.
$re = '`\b(?:(?:https?|ftp)://|www\.)([-a-z0-9 &@#/%?=~_|!:,.;]*)/products/(?1)[-a-z0-9 &@#/%=~_|]`';
$str = <<<EOF
http://example.com/products/1/abc
This string is valid - http://example.com/products/1
This string is not valid - http://example.com/order/1
EOF;
preg_match_all($re, $str, $matches);
print_r($matches[0]);
Output
Array
(
[0] => http://example.com/products/1/abc
[1] => http://example.com/products/1
)
CodePudding user response:
Beside the answer from "The fourth bird" I am proposing another hybrid solution which is using both regex and classic string operations to provide a helper function with some additional options e.g. to get different results in runtime without changing the RE
<?php
function GetURL($str, $pattern='/products/')
{
$temp = array();
preg_match_all('#\bhttps?://[^,\s()<>] (?:\([\w\d] \)|([^,[:punct:]\s]|/))#', $str, $match);
foreach ($match[0] as $link)
{
if(!$pattern)
array_push($temp, $link);
else if(strpos($link, $pattern) !== false)
array_push($temp, $link);
}
return $temp;
}
$str = <<<EOF
This string is valid - http://example.com/products/1
This string is not valid - http://example.com/order/1
EOF;
print_r(GetURL($str)); //Urls only with /products/ inside
print_r(GetURL($str, '/order/')); //Urls only with /order/ inside
print_r(GetURL($str, false)); //All urls
?>
OUTPUT
Array ( [0] => http://example.com/products/1 )
Array ( [0] => http://example.com/order/1 )
Array (
[0] => http://example.com/products/1
[1] => http://example.com/order/1
)