Split a string text by element class using php and keeping all original text inside array-CodePudding

i'd like to split a long text into chunks. I need to split by element class (element can be h, p, span, div or others unknown tag). So, for example, if I got a string like:

$string = 'Hi this is a long <span >string</span> and I need to <span >split it into chunks</span> and I need help for <span >this</span>';

I'd like to split by cut class, into array, keeping all texts: Expected result:

$array(
   0 => 'Hi this is a long ',
   1 => '<span >string</span>',
   2 => ' and I need to ',
   3 => '<span >split it into chunks</span>',
   4 => ' and I need help for ',
   5 => '<span >this</span>'
);

I don't find any example on the web.

I find only this one by it find only elements by class and exclude all other text and I don't know if it is usefull for my purpose:

 $domdocument = new DOMDocument();
 $domdocument->loadHTML($contenuto);
 $a = new DOMXPath($domdocument);
 $elements = $a->query("//*[contains(concat(' ', normalize-space(@class), ' '), ' cut')]");

 for ($i = $elements->length - 1; $i > -1; $i--) {
    var_dump($elements->item($i)->firstChild->nodeValue);
 }

CodePudding user response：

We can try a preg_match_all regex match all approach here:

$string = 'Hi this is a long <span >string</span> and I need to <span >split it into chunks</span> and I need help for <span >this</span>';
preg_match_all("/<(\w ).*?>.*?<\/\\1>|.*?(?=<|$)/", $string, $matches);
$lines = $matches[0];
array_pop($lines);
print_r($lines);

This prints:

Array
(
    [0] => Hi this is a long 
    [1] => <span >string</span>
    [2] =>  and I need to 
    [3] => <span >split it into chunks</span>
    [4] =>  and I need help for 
    [5] => <span >this</span>
)

The regex pattern used here says to match:

<(\w ).*?>  an HTML tag
.*?         any content
<\/\\1>     closing tag
|           OR
.*?         any other content until reaching, but not including
(?=<|$)     the next HTML tag or the end of the input