I have the following javascript-excerpt-as-text:
for (let orange of oranges) {
for (let apple of apples) {
for (let banana of bananas) {
obfuscatedArray[i] = obfuscatedArray[i].split('').reverse().join('');
obfuscatedArray[i] = window.atob(obfuscatedArray[i]);
}
}
}
from which I would like to remove the excess newlines at the bottom:
for (let orange of oranges) {
for (let apple of apples) {
for (let banana of bananas) {
obfuscatedArray[i] = obfuscatedArray[i].split('').reverse().join('');
obfuscatedArray[i] = window.atob(obfuscatedArray[i]);
}
}
}
I have written this regex:
/(;|})(\n(\h*)) }/
in the following PHP:
$myString = preg_replace('/(;|})(\n(\h*)) }/', "\$1\n\$3}", $myString);
but, for reasons I can't ascertain, the newline between the first closing curly brace and the second isn't being removed.
I have tested the regex in Regex101 (ie. outside PHP's preg_replace()
function) and it still only finds two matches instead of three.
I really can't understand where I'm going wrong with the regex?
CodePudding user response:
You consume (i.e. match and add matched text to the overall match memory buffer and advance the regex index) the ;
or }
and a }
after one or more newlines. Once a substring is consumed, the next match cannot consume the same text.
You may use lookarounds to override this:
preg_replace('~([;}])\h*\R(?=\h*(?:\R\h*) })~', '$1', $text)
preg_replace('~(?<=[;}])\h*\R(?=\h*(?:\R\h*) })~', '', $text)
preg_replace('~[;}]\K\h*\R(?=\h*(?:\R\h*) })~', '', $text)
See the regex demo (or this regex demo).
Note in the last two examples, there is no need to use a $1
backreference as there is no capturing group in the pattern, it was replaced with a non-consuming lookbehind ((?<=[;}])
) or \K
was used to clear the current match memory buffer.
Details:
([;}])
- capturing group #1: a;
or}
chars(?<=[;}])
- a positive lookbehind that requires;
or}
to appear immediately to the left of the current location[;}]\K
- a;
or}
and then the\K
operator "loses" the text matched (the;
or}
are removed from the match memory buffer)\h*
- zero or more horizontal whitespaces\R
- a line break sequence(?=\h*(?:\R\h*) })
- a positive lookahead that matches a location that is immediately followed with\h*
- zero or more horizontal whitespaces(?:\R\h*)
- one or more occurrences of a line break sequence and zero or more horizontal whitespaces}
- a}
char.
CodePudding user response:
Your pattern is matching the last line with the }
and can not be matched again to take part in the next match attempt.
If you want to replace all "empty" lines in between, you change your pattern to assert a newline followed by horizontal whitespace chars to the right followed by }
to not consume it.
(;|})(\n(\h*)) (?=\n\h*})
In the replacement use group 1 $1
The pattern can also be written to using \K
omitting the first capture group, then omit the other superfluous capture groups, a character class [;}]
instead of an alternation and using \R
to match any unicode newline sequence instead of only a newline:
[;}]\K(?:\R\h*)*(?=\R\h*})
In the replacement use an empty string.
As you want to match all "empty" lines in between, you can replace (?:\R\h*)*
with \s*
shortening the pattern to:
[;}]\K\s*(?=\R\h*})
The pattern matches:
[;}]
Match either;
or}
\K
Forget what is matched so far (clear the current match buffer)\s*
Match optional whitespace chars(?=\R\h*})
Positive lookahead, assert from the current position a newline, optional horizontal whitespace chars and}