I'm trying to remove a HTML element from a string,
I have the following preg_replace
;
$body = preg_replace('#<div style="margin: 8px 0; clear: both;">(.*?)</div>#', '', $body);
But the preg_replace
doesn't seem to work;
Here is the full code;
$html = new DOMDocument();
@$html->loadHtmlFile($url);
$xpath = new DOMXPath( $html );
$nodelist = $xpath->query( '//*[@]' );
$body = '';
foreach ($nodelist as $n){
$body .= $html->saveHtml($n)."\n";
}
$body = preg_replace('#<div style="margin: 8px 0; clear: both;">(.*?)</div>#', '', $body);
The current output is this;
<div >
hello this is content
<div style="margin: 8px 0; clear: both;">
<div><center><span style="font-size:11px; color: gray;"TEST</span></center>
<b>TEST</b><br><br></div></div>
<div >
</ul></div><!-- AI CONTENT END 1 -->
<div style="margin-bottom:15px; font-weight: bold; text-align:center;">Tags: <a href="#" rel="tag">test</a> <a href="#" rel="tag">#tag</a></div>
</div>
And my desired output is ;
<div >
hello this is content
</div>
I really appreciate any help I'm sure there is an easier way to achieve this I'm just not entirely sure why my current method is not working thankyou.
CodePudding user response:
This is cheating a bit. The main problem with trying to use regex to parse HTML is the nesting tags, which will drive you to madness. If you truly only need to keep the first <div>
and the content that occurs before the second <div>
, the below will work.
preg_match('#<div >(.*)<div.*$#Us', $body, $matches);
$body = '<div >' . $matches[1] . '</div>';
... since we're just extracting the content we need, and inserting it into the content format that's static.
Foul
CodePudding user response:
Depending of the input but maybe :
<?php
$input = '
<div >
hello this is content
<div style="margin: 8px 0; clear: both;">
<div><center><span style="font-size:11px; color: gray;"TEST</span></center>
<b>TEST</b><br><br></div></div>
<div >
</ul></div><!-- AI CONTENT END 1 -->
<div style="margin-bottom:15px; font-weight: bold; text-align:center;">Tags: <a href="#" rel="tag">test</a> <a href="#" rel="tag">#tag</a></div>
</div>';
$data = preg_replace('#^.*?(<div[^>] >[^<] ).*(</div>).*?$#s', '$1$2', $input);
echo '<pre>';
echo htmlentities($data);
/*
<div >
hello this is content
</div>
*/