Suppose I have this text:
cat file
/* comment */ not a comment /* another comment */
/* delete this *
/* multiline *
/* comment */
/*************
/* and this *
/************/
The End
I can use the perl
with a conditional ? :
to delete only the multiline comment:
perl -0777 -pE 's/(\/\*(?:\*(?!\/)|[^*])*\*\/)/($1=~qr"\R") ? "" : $1/eg;' file
Prints:
/* comment */ not a comment /* another comment */
The End
Without the conditional:
perl -0777 -pE 's/(\/\*(?:\*(?!\/)|[^*])*\*\/)//g;' file
not a comment
The End
Is there a way to delete only multiline C style comments with a regex only? ie, not use the perl conditional code in the replacement?
CodePudding user response:
You can use
perl -0777 -pe 's~/\*(?:(?!\*/|/\*).)*\R(?s).*?\*/~~g' file
The pattern matches
/\*
- a/*
string(?:(?!\*/|/\*).)*
- zero or more chars other than line break chars, each of which is not a starting char of a*/
and/*
char sequences\R
- a line break sequence(?s)
- now,.
will also match line breaks.*?
- any zero or more chars as few as possible\*/
- a*/
substring.
See the regex demo.
CodePudding user response:
With a SKIP/FAIL approach:
perl -0777 -pe's~/\*\N*?\*/(*SKIP)^|/\*.*?\*/~~gs' file
\N
matches all that isn't a line-break
The dot matches all characters including newlines since the s flag is used.
The first branch matches "inline" comments, and is forced to fail with ^
(shorter than writing (*F)
or (*FAIL)
but same result). The (*SKIP)
backtracking control verb forces to not retry previous positions, so the next attempts starts after the position of the closing */
.
The second branch matches remaining comments that are necessarly multiline.
A shorter variant, with the same two branches but this time using \K
to excludes the consumed characters from the match result:
perl -0777 -pe's~/\*\N*?\*/\K|/\*.*?\*/~~gs' file
This time the first branch succeeds, but since all characters before \K
are removed from the match result, the remaining empty string is replaced with an empty string.
These two replacements aren't different than doing something like:
s~(/\*.*?\*/)|/\*[\s\S]*?\*/~$1~g
that is more portable, but with less efforts.