Home > Mobile >  Alternative to xpath with regex
Alternative to xpath with regex

Time:12-07

I have some html pages with different phone numbers on it. Example:

<p style="text-align: center;">(xxx) xxxx xxxx</p>
<span style="text-align: center;">xxxxxxxxxx</span>
<li style="text-align: center;">(xxx) x xxx xxxx</li>
<p style="text-align: left;">xxxxx xxxx</p>

I would like to know the best way to change or even remove them using php.

My main idea would be using xpath with regex to find the text, but I believe regex doesn't work with xpath.

CodePudding user response:

I'm not familiar with XPATH but i find a nice article that can help you to Use PHP Functions in XPath Expressions.

You need to create a function that she do stuff : preg_match_all or preg_match or preg_replace.

after write you variable which contains html code :

$YourHtmlCode = <<<HTML
'<p style="text-align: center;">(xxx) xxxx xxxx</p>
    <span style="text-align: center;">xxxxxxxxxx</span>
    <li style="text-align: center;">(xxx) x xxx xxxx</li>
    <p style="text-align: left;">xxxxx xxxx</p>';
HTML;

Convert your html text to DOM Document like :

$dom = new DOMDocument;
$dom->loadHTML($YourHtmlCode, LIBXML_HTML_NOIMPLIED|LIBXML_HTML_NODEFDTD);

After use registerPHPFunctions to call the function as above.


I use (?<=>)(.*?)(?=<) to match all elements between > and < operator. Example

You can do like this to get all parts.

<?php
 
$reg = '/(?<=\>)(.*?)(?=\<)/m';
$str = '<p style="text-align: center;">(xxx) xxxx xxxx</p>
<span style="text-align: center;">xxxxxxxxxx</span>
<li style="text-align: center;">(xxx) x xxx xxxx</li>
<p style="text-align: left;">xxxxx xxxx</p>';

preg_match_all($reg, $str, $matches, PREG_SET_ORDER);

foreach ($matches as $val) {
    echo "matched: " . $val[0] . "\n";
}

?>

After you can do your modification in the value directly.

If you want to replace directly the value with regex, you can use preg_replace.

For example :

<?php
$reg = '/(?<=\>)(.*?)(?=\<)/m';
$str = '<p style="text-align: center;">(xxx) xxxx xxxx</p>
<span style="text-align: center;">xxxxxxxxxx</span>
<li style="text-align: center;">(xxx) x xxx xxxx</li>
<p style="text-align: left;">xxxxx xxxx</p>';

echo preg_replace($reg, "ReplaceString", $str); 
?>

CodePudding user response:

An example using regular expressions. The surrounding tags are also removed.

((\ |\d|\(|(<.*?>))[\d\-\(\)\. ]{9,}(\.|\n| |<\/.*>)(?!(png|jpg|<)))

example

  • Related