Home > OS >  Only find multiline C comment but not single line C comments
Only find multiline C comment but not single line C comments

Time:12-23

Suppose I have this text:

cat file
/* comment */ not a comment /* another comment */

/* delete this  *
/* multiline    *
/* comment      */

/*************
/* and this  *  
/************/
The End

I can use the perl with a conditional ? : to delete only the multiline comment:

perl -0777 -pE 's/(\/\*(?:\*(?!\/)|[^*])*\*\/)/($1=~qr"\R") ? "" : $1/eg;' file

Prints:

/* comment */ not a comment /* another comment */




The End

Without the conditional:

perl -0777 -pE 's/(\/\*(?:\*(?!\/)|[^*])*\*\/)//g;' file
 not a comment 




The End

Is there a way to delete only multiline C style comments with a regex only? ie, not use the perl conditional code in the replacement?

CodePudding user response:

You can use

perl -0777 -pe 's~/\*(?:(?!\*/|/\*).)*\R(?s).*?\*/~~g' file

The pattern matches

  • /\* - a /* string
  • (?:(?!\*/|/\*).)* - zero or more chars other than line break chars, each of which is not a starting char of a */ and /* char sequences
  • \R - a line break sequence
  • (?s) - now, . will also match line breaks
  • .*? - any zero or more chars as few as possible
  • \*/ - a */ substring.

See the regex demo.

CodePudding user response:

With a SKIP/FAIL approach:

perl -0777 -pe's~/\*\N*?\*/(*SKIP)^|/\*.*?\*/~~gs' file

demo

\N matches all that isn't a line-break
The dot matches all characters including newlines since the s flag is used.

The first branch matches "inline" comments, and is forced to fail with ^ (shorter than writing (*F) or (*FAIL) but same result). The (*SKIP) backtracking control verb forces to not retry previous positions, so the next attempts starts after the position of the closing */.

The second branch matches remaining comments that are necessarly multiline.


A shorter variant, with the same two branches but this time using \K to excludes the consumed characters from the match result:

perl -0777 -pe's~/\*\N*?\*/\K|/\*.*?\*/~~gs' file

demo

This time the first branch succeeds, but since all characters before \K are removed from the match result, the remaining empty string is replaced with an empty string.


These two replacements aren't different than doing something like:

s~(/\*.*?\*/)|/\*[\s\S]*?\*/~$1~g

that is more portable, but with less efforts.

  • Related