I want to split a string containing HTML tags into words. Since HTML tags are not words, they should be captured separately. I started with
$str='this is <a href="">something for test</a>';
print_r(preg_split('# (?![^<]{1,99}[>])#',$str));
Array
(
[0] => this
[1] => is
[2] => <a href="">something
[3] => for
[4] => test</a>
)
to maintain the HTML tag structure, but I need an extra splitting to separate the HTML tags to produce
Array
(
[0] => this
[1] => is
[2] => <a href="">
[3] => something
[4] => for
[5] => test
[6] => </a>
)
CodePudding user response:
$str = 'this is <a href="">something for test</a>';
print_r(preg_split("/(<[^>]*[^\/]>)| /i", $str,-1,PREG_SPLIT_DELIM_CAPTURE PREG_SPLIT_NO_EMPTY));