i'd like to split a long text into chunks. I need to split by element class (element can be h, p, span, div or others unknown tag). So, for example, if I got a string like:
$string = 'Hi this is a long <span >string</span> and I need to <span >split it into chunks</span> and I need help for <span >this</span>';
I'd like to split by cut
class, into array, keeping all texts:
Expected result:
$array(
0 => 'Hi this is a long ',
1 => '<span >string</span>',
2 => ' and I need to ',
3 => '<span >split it into chunks</span>',
4 => ' and I need help for ',
5 => '<span >this</span>'
);
I don't find any example on the web.
I find only this one by it find only elements by class and exclude all other text and I don't know if it is usefull for my purpose:
$domdocument = new DOMDocument();
$domdocument->loadHTML($contenuto);
$a = new DOMXPath($domdocument);
$elements = $a->query("//*[contains(concat(' ', normalize-space(@class), ' '), ' cut')]");
for ($i = $elements->length - 1; $i > -1; $i--) {
var_dump($elements->item($i)->firstChild->nodeValue);
}
CodePudding user response:
We can try a preg_match_all
regex match all approach here:
$string = 'Hi this is a long <span >string</span> and I need to <span >split it into chunks</span> and I need help for <span >this</span>';
preg_match_all("/<(\w ).*?>.*?<\/\\1>|.*?(?=<|$)/", $string, $matches);
$lines = $matches[0];
array_pop($lines);
print_r($lines);
This prints:
Array
(
[0] => Hi this is a long
[1] => <span >string</span>
[2] => and I need to
[3] => <span >split it into chunks</span>
[4] => and I need help for
[5] => <span >this</span>
)
The regex pattern used here says to match:
<(\w ).*?> an HTML tag
.*? any content
<\/\\1> closing tag
| OR
.*? any other content until reaching, but not including
(?=<|$) the next HTML tag or the end of the input