Home > OS >  preg replace to remove a div from a string
preg replace to remove a div from a string

Time:03-29

I'm trying to remove a HTML element from a string,

I have the following preg_replace ;

    $body = preg_replace('#<div  style="margin: 8px 0; clear: both;">(.*?)</div>#', '', $body);

But the preg_replace doesn't seem to work;

Here is the full code;

    $html = new DOMDocument();
     @$html->loadHtmlFile($url);
     $xpath = new DOMXPath( $html );
     $nodelist = $xpath->query( '//*[@]' );
     $body = '';
    foreach ($nodelist as $n){
        $body .= $html->saveHtml($n)."\n";
    } 
    
    $body = preg_replace('#<div  style="margin: 8px 0; clear: both;">(.*?)</div>#', '', $body);
    

The current output is this;

<div >
hello this is content
<div  style="margin: 8px 0; clear: both;">
<div><center><span style="font-size:11px; color: gray;"TEST</span></center>
<b>TEST</b><br><br></div></div>
<div >
    </ul></div><!-- AI CONTENT END 1 -->
<div  style="margin-bottom:15px; font-weight: bold; text-align:center;">Tags: <a href="#" rel="tag">test</a> <a href="#" rel="tag">#tag</a></div>
</div>

And my desired output is ;

<div >
hello this is content
</div>

I really appreciate any help I'm sure there is an easier way to achieve this I'm just not entirely sure why my current method is not working thankyou.

CodePudding user response:

This is cheating a bit. The main problem with trying to use regex to parse HTML is the nesting tags, which will drive you to madness. If you truly only need to keep the first <div> and the content that occurs before the second <div>, the below will work.

preg_match('#<div >(.*)<div.*$#Us', $body, $matches);
$body = '<div >' . $matches[1] . '</div>';

... since we're just extracting the content we need, and inserting it into the content format that's static.

Foul

CodePudding user response:

Depending of the input but maybe :

<?php
$input = '
<div >
hello this is content
<div  style="margin: 8px 0; clear: both;">
<div><center><span style="font-size:11px; color: gray;"TEST</span></center>
<b>TEST</b><br><br></div></div>
<div >
    </ul></div><!-- AI CONTENT END 1 -->
<div  style="margin-bottom:15px; font-weight: bold; text-align:center;">Tags: <a href="#" rel="tag">test</a> <a href="#" rel="tag">#tag</a></div>
</div>';
$data = preg_replace('#^.*?(<div[^>] >[^<] ).*(</div>).*?$#s', '$1$2', $input);
echo '<pre>';
echo htmlentities($data);
/*
<div >
hello this is content
</div>
*/
  • Related