I have data that looks like this:
#1
(1) This is a test.
a) This is a subtest one.
b) And another one.
(2) A really cool test.
(3) Here is the problem, text for each numbered line is
supposed to be on a single line like in (1) and (2), but
the text often spans multiple lines of text.
(4) How can I match the multi-line entries and unwrap them to single lines?
#2
(1) This is a test.
a) This is a subtest one.
b) And another one.
(2) A really cool test.
(3) Here is the problem, text for each numbered line is
supposed to be on a single line like in (1) and (2), but
the text often spans multiple lines of text.
(4) How can I match the multi-line entries and unwrap them to single lines?
#3
(1) This is a test.
a) This is a subtest one.
b) And another one.
(2) A really cool test.
(3) Here is the problem, text for each numbered line is
supposed to be on a single line like in (1) and (2), but
the text often spans multiple lines of text.
(4) How can I match the multi-line entries and unwrap them to single lines?
I need a Regular Expression that matches multiple multi-line text entries so I can unwrap them to single lines.
I've tried this:
$pattern = '/^(\(?[a-z0-9] \) )([\s\S] ?(?!#))(^\(?[a-z0-9] \))/mS';
$text = preg_replace_callback ($pattern, function ($grp) {
return $grp[1] . unwrap ($grp[2]) . PHP_EOL . $grp[3];
}, $text);
I feel like this should be a simple regex to write, but I'm having trouble for some reason.
CodePudding user response:
You can match every entry using lookahead with the following regex and unwrap the whole match:
'^\(\d \)[^#]*?(?=\n\(\d\)|\Z|#)'
See Demo
EDIT: from your question it's not clear how you want to handle sub-entries like a) and b). In this case they will be recognized as normal text.
EDIT2: in order to match a) and b) as entries as well:
'^(?:[a-z]\)|\(\d \))[^#]*?(?=\n\(\d\)|\Z|#|\n[a-z]\))'