Home > database >  Bash script to replace matched substrings within larger substring
Bash script to replace matched substrings within larger substring


I'm trying to write a bash script to replace the newline characters and *s from comments, but only if that comment contains a particular substring.

// file.txt
 * Here is a multiline
 * comment that contains substring
 * rest of it

 * Here is a multiline
 * comment that does not contain subNOTstring
 * rest of it

I would like the final result to be:

// file.txt
/** Here is a multiline comment that contains substring rest of it */

 * Here is a multiline
 * comment that does not contain subNOTstring
 * rest of it

I have a regex that matches multiline comments: \/\*([^*]|[\r\n]|(\* ([^*\/]|[\r\n])))*\*\/ but can't figure out the second part, of only matching with the substring, and then replacing all the /n * with just

So to make sure my question is articulated correctly

  1. Make a match of a substring within a file. i.e. comment
  2. Make sure that match includes substring.
  3. Replace all substring within the first match with another string. i.e. n/ * with

CodePudding user response:

Correctly matching multi-line comments with a regex isn't trivial; it might even be close to impossible.

That said, if they're strictly formatted like shown in your sample then you can work-up something, but it's dangerous nevertheless:

awk '
    /^\/\*\*/ { comment = 1 }
    /^ \*\// { comment = 0 }
    comment { sub(/^ \*/," ") }

CodePudding user response:

If python is your option, would you please try:


import re                                                       # use regex module

with open('file.txt') as f:                                     # open "file.txt" to read
    str = f.read()                                              # assign "str" to the lines of the file

for i in re.split(r'(/\*.*?\*/)', str, flags=re.DOTALL):        # split the file on the comment including the comment in the result
    if re.match(r'/\*.*substring', i, flags=re.DOTALL):         # if the comment includes the keyword "substring"
        i = re.sub(r'\n \* |\n (?=\*/)', ' ', i)                # then replace the newline and the asterisk with a whitespace
    print(i, end='')                                            # print the element without adding newline
  • re.split(r'(/\*.*?\*/)', str, flags=re.DOTALL) splits "str" on the comment including the comment in the splitted list.
  • The flags=re.DOTALL option makes a dot match with newline characters.
  • for i in .. syntax loops over the list assiging "i" to each element.
  • re.match(r'/\*.*substring', i, flags=re.DOTALL) matches the element which is a comment including the keyword "substring".
  • re.sub(r'\n \* |\n (?=\*/)', ' ', i) replaces a newline followed by the " * " in the next line with a whitespace.
  • \n (?=\*/) is a positive lookahead which matches a newline followed by " */". It will match the last line of the comment block leaving the "*/" as is.
  • Related