Home > OS >  Regex: stripping multi-line comments but maintaining a line break & Single line comments at start of
Regex: stripping multi-line comments but maintaining a line break & Single line comments at start of

Time:10-28

The input next fle is as follows

int 1; //integer
//float 1; //floating point number
int m; //integer
/*if a==b
begin*/
print 23 /* 1, 2, 3*/
end
float/* ty;
int yu;*/

Expected output is as follows

int 1; //integer
int m; //integer
print 23 
end
float

CodePudding user response:

Here is a two step replacement which seems to work:

inp = """int 1; //integer
//float 1; //floating point number
int m; //integer
/*if a==b
begin*/
print 23 /* 1, 2, 3*/
end
float/* ty;
int yu;*/"""

output = re.sub(r'^\s*//.*?\n', '', inp, flags=re.M)
output = re.sub(r'\n?/\*.*?\*/(\n?)', r'\1', output, flags=re.M|re.S)
print(output)

This prints:

int 1; //integer
int m; //integer
print 23 
end
float

The first call to re.sub removes all lines which start with a // comment. The second call to re.sub removes the C-style /* */ comments. It works by trying to match a newline both before and after the comment itself. Then, it replaces with as much as only a single newline, assuming one followed the comment.

CodePudding user response:

You can convert matches of the following to empty strings.

\/\/.*\r?\n|\/\/.*|^\/\*[\s\S]*?\*\/\r?\n|\/\*[\s\S]*?\*\/

Demo

Note the second alternation element must follow the first and the fourth alternation element must follow the third.

The regular expression can be broken down as follows.

(?m)       # set multiline flag 
  ^\/\/    # match '//' at beginning of line
  .*\r?\n  # match 0  chars other than line
           # terminators then match line terminator
|          # or
  \/\/.*   # match '//'
  .*       # match the remainder of the line
|          # or
  ^\/\*    # match '/*' at the beginning of a line
  [\s\S]*? # match 0  characters including line
           # terminators, lazily
  \*\/     # match '*/'
  \r?\n    # match line terminators
|          # or
  \*\/     # match '*/'
  [\s\S]*? # match 0  characters including line
           # terminators, lazily
  \*\/     # match '*/'
  • Related