Home > Blockchain >  How to unwrap multiple multi-line text to single lines?
How to unwrap multiple multi-line text to single lines?

Time:10-05

I have data that looks like this:

#1
(1) This is a test.
a) This is a subtest one.
b) And another one.
(2) A really cool test.
(3) Here is the problem, text for each numbered line is
supposed to be on a single line like in (1) and (2), but
the text often spans multiple lines of text.
(4) How can I match the multi-line entries and unwrap them to single lines?

#2
(1) This is a test.
a) This is a subtest one.
b) And another one.
(2) A really cool test.
(3) Here is the problem, text for each numbered line is
supposed to be on a single line like in (1) and (2), but
the text often spans multiple lines of text.
(4) How can I match the multi-line entries and unwrap them to single lines?

#3
(1) This is a test.
a) This is a subtest one.
b) And another one.
(2) A really cool test.
(3) Here is the problem, text for each numbered line is
supposed to be on a single line like in (1) and (2), but
the text often spans multiple lines of text.
(4) How can I match the multi-line entries and unwrap them to single lines?

I need a Regular Expression that matches multiple multi-line text entries so I can unwrap them to single lines.

I've tried this:

$pattern = '/^(\(?[a-z0-9] \) )([\s\S] ?(?!#))(^\(?[a-z0-9] \))/mS';

$text = preg_replace_callback ($pattern, function ($grp) {
    return $grp[1] . unwrap ($grp[2]) . PHP_EOL . $grp[3];
}, $text);

I feel like this should be a simple regex to write, but I'm having trouble for some reason.

CodePudding user response:

You can match every entry using lookahead with the following regex and unwrap the whole match:

'^\(\d \)[^#]*?(?=\n\(\d\)|\Z|#)'

See Demo

EDIT: from your question it's not clear how you want to handle sub-entries like a) and b). In this case they will be recognized as normal text.

EDIT2: in order to match a) and b) as entries as well:

'^(?:[a-z]\)|\(\d \))[^#]*?(?=\n\(\d\)|\Z|#|\n[a-z]\))'

Demo

  • Related