Let's say I have the following log file (with line endings):
[xxx] test test[xxx]foobar
more data
[xxx] more data
[xxx] other data []:foo bar
more data here
[xxx] 1234
I would like to retrieve all parts starting with [xxx]
up until the next occurrence of [xxx]
, so the result would become (\n
indicating the newline here):
$result = [
'[xxx] test test[xxx]foobar \n more data',
'[xxx] more data',
'[xxx] other data []:foo bar \n more data here',
'[xxx] 1234'
]
I came up with the regex /(\[xxx\] .*)/g
but it fails to match the cases where there are multiple lines per log entry. I've tried variations like /(\[xxx\] [\s.]*)/g
but to no avail.
I feel like I'm missing something obvious here. What modifiers or other syntax should I use?
CodePudding user response:
You can use either of
preg_match_all('~\[xxx].*(?:\R(?!\[xxx]).*)*~', $text, $matches)
preg_match_all('~\[xxx].*?(?=\[xxx]|\z)~s', $text, $matches)
Or - if the left hand [xxx]
always appears at the start of a line
preg_match_all('~^\[xxx].*(?:\R(?!\[xxx]).*)*~m', $text, $matches)
preg_match_all('~^\[xxx].*?(?=^\[xxx]|\z)~ms', $text, $matches)
The first solution (demo) is preferable because it is more efficient (see the second regex demo).
Details:
^
- start of a line\[xxx]
- a[xxx]
string.*
- the rest of the line(?:\R(?!\[xxx]).*)*
- zero or more sequences of\R(?!\[xxx])
- a line break sequence not immediately followed with[xxx]
.*
- the rest of the line.
The ^\[xxx].*?(?=^\[xxx]|\z)
regex matches [xxx]
at the start of a line, then any zero or more chars as few as possible, and then either a position immediately followed with [xxx]
at the start of a line or end of string.
CodePudding user response:
An alternate php solution using preg_split preg_replace
with a simple regex:
$data = '[xxx] test test[xxx]foobar
more data
[xxx] more data
[xxx] other data []:foo bar
more data here
[xxx] 1234';
foreach(preg_split('/^(?=\[xxx] )/m', $data) as $el) {
echo preg_replace('/\n(?!$)/', '\\n', $el);
}
Output:
[xxx] test test[xxx]foobar\nmore data
[xxx] more data
[xxx] other data []:foo bar\nmore data here
[xxx] 1234
Breakup:
/^(?=\[xxx] )/m
: Using this regex inpreg_split
so that we split input text every time[xxx]
appears on line start/\n(?!$)/
: Using this regex to replace\n
from each element of split array with\\n