I'm trying to write a bash script to replace the newline characters and *
s from comments, but only if that comment contains a particular substring.
// file.txt
/**
* Here is a multiline
* comment that contains substring
* rest of it
*/
/**
* Here is a multiline
* comment that does not contain subNOTstring
* rest of it
*/
I would like the final result to be:
// file.txt
/** Here is a multiline comment that contains substring rest of it */
/**
* Here is a multiline
* comment that does not contain subNOTstring
* rest of it
*/
I have a regex that matches multiline comments: \/\*([^*]|[\r\n]|(\* ([^*\/]|[\r\n])))*\*\/
but can't figure out the second part, of only matching with the substring, and then replacing all the /n *
with just
So to make sure my question is articulated correctly
- Make a match of a substring within a file. i.e. comment
- Make sure that match includes
substring
. - Replace all substring within the first match with another string. i.e.
n/ *
with
CodePudding user response:
Correctly matching multi-line comments with a regex isn't trivial; it might even be close to impossible.
That said, if they're strictly formatted like shown in your sample then you can work-up something, but it's dangerous nevertheless:
awk '
/^\/\*\*/ { comment = 1 }
/^ \*\// { comment = 0 }
comment { sub(/^ \*/," ") }
1
'
CodePudding user response:
If python
is your option, would you please try:
#!/usr/bin/python
import re # use regex module
with open('file.txt') as f: # open "file.txt" to read
str = f.read() # assign "str" to the lines of the file
for i in re.split(r'(/\*.*?\*/)', str, flags=re.DOTALL): # split the file on the comment including the comment in the result
if re.match(r'/\*.*substring', i, flags=re.DOTALL): # if the comment includes the keyword "substring"
i = re.sub(r'\n \* |\n (?=\*/)', ' ', i) # then replace the newline and the asterisk with a whitespace
print(i, end='') # print the element without adding newline
re.split(r'(/\*.*?\*/)', str, flags=re.DOTALL)
splits "str" on the comment including the comment in the splitted list.- The
flags=re.DOTALL
option makes a dot match with newline characters. for i in ..
syntax loops over the list assiging "i" to each element.re.match(r'/\*.*substring', i, flags=re.DOTALL)
matches the element which is a comment including the keyword "substring".re.sub(r'\n \* |\n (?=\*/)', ' ', i)
replaces a newline followed by the " * " in the next line with a whitespace.\n (?=\*/)
is a positive lookahead which matches a newline followed by " */". It will match the last line of the comment block leaving the "*/" as is.