I'm struggling in creating a regex to capture what's included between two keywords in a multi-line file.
In particular, consider the following file:
#%META
# date: 2022-08-27
# generated-by: Me
# id: 1
#%ENDS
#%BODY
....
#%ENDS
#%META
# date: 2022-08-27
# generated-by: Another Me
# id: 2
#%ENDS
#%BODY
....
#%ENDS
I wanted to parse what is included between the #%META
and the #%ENDS
keywords, if possible, without the leading #
, i.e., the desired result is to capture both:
date: 2022-08-27
generated-by: Me
id: 1
and
date: 2022-08-27
generated-by: Another Me
id: 2
I come out with following regex: (?<=#%META\n)([\S\s]*?)(?=#%ENDS\n)
.
However this is not capable to identify the two chuncks of text to be matched as well as does not remove the leading #
.
Could anyone help in that?
Thank's a lot! :)
CodePudding user response:
You might use a pattern to first capture all the parts between #%META
and #%ENDS
and then after process the capture group 1 values removing the leading #
followed by optional spaces.
^#%META((?>\R(?!#%(?:META|ENDS)$).*) )\R#%ENDS$
Explanation
^
Start of string#%META
Match literally(
Capture group 1(?>
Atomic group\R
Match any unicode newline sequence(?!#%(?:META|ENDS)$)
Negative lookahead, assert that the line is not#%META
or#%ENDS
.*
Match the whole line
)
Close the atomic group and repeat 1 times
)
Close group 1\R
Match any unicode newline sequence#%ENDS
Match literally$
End of string
Example
$re = '/^#%META((?>\R(?!#%(?:META|ENDS)$).*) )\R#%ENDS$/m';
$str = '#%META
# date: 2022-08-27
# generated-by: Me
# id: 1
#%ENDS
#%BODY
....
#%ENDS
#%META
# date: 2022-08-27
# generated-by: Another Me
# id: 2
#%ENDS
#%BODY
....
#%ENDS';
if (preg_match_all($re, $str, $matches)) {
$result = array_map(function ($s) {
return preg_replace("/^#\h*/m", "", trim($s));
}, $matches[1]);
var_export($result);
}
Output
array (
0 => 'date: 2022-08-27
generated-by: Me
id: 1',
1 => 'date: 2022-08-27
generated-by: Another Me
id: 2',
)
CodePudding user response:
You forgot to add /m modifier to regex to find all matches
Try this:
$str = preg_replace_callback(
'/# (. )\S/m',
static function ($m) {
return $m[1];
},
$str,
); // or just str_replace('# ', '', $str)
preg_match('/((?<=#%META\n)([\S\s]*?)(?=#%ENDS\n))/m' ,$str, $m);
var_dump($m);