Regex to remove numbers if not in a url-CodePudding

I'm using this regex to remove number in a string and it works fine for that

$story = preg_replace('/\ ?[0-9][0-9()\-\s ]{4,20}[0-9]/', '', $story);

My problem is that is do not want to remove numbers if they are part of a url for ex.

domain.com/page/?id=article_1625463

Is it possible just to remove it from other parts of the string.

So this:

Call our company on 3453453454 or 0045 345 532 34 or visit our website domain.com/page/?id=article_1625463

Becomes:

Call our company on or or visit our website domain.com/page/?id=article_1625463

CodePudding user response：

For an example like this maybe regex is an overkill if the numbers are so obvious.

Since you tagged also php you can use an easier way like:

<?php
$string = "Call our company on 3453453454 or 0045 345 532 34 or visit our website domain.com/page/?id=article_1625463";

$myArray = explode(" ",$string);

foreach($myArray as $key => $value){
    if (!is_numeric($value)){
        $constructedSentence[] = $value;
    }
}
echo implode(' ', $constructedSentence);

Basically you break your sentence into parts with space separator and simply check if the array is only a number. If it is you just exclude it and go on and in the end you format your sentence the way you want.

You can extend this logic to check if the string contains a .com part or something that can be indicated as url so you include it as is otherwise you just strip the numbers.

But for your example this php code will do the job just fine without any regex.

CodePudding user response：

A regex for this looks like

<URL_REGEX>(*SKIP)(*F)|<YOUR_REGEX>

If we agree that each URL starts with http and goes up to the next whitespace or end of string, you can use

preg_replace('/http\S*(*SKIP)(*F)|\s*\ ?[0-9][0-9()\-\s ]{4,20}[0-9]/i', '', $story)

See the regex demo.

Here, http\S*(*SKIP)(*F)| matches http and then any zero or more non-whitespace chars, and then the match is failed, the regex engine starts looking for the next match from the failure position. So, the \ ?[0-9][0-9()\-\s ]{4,20}[0-9] part will never match in the URLs.