Home > Software engineering >  Regex remove hash comments exclude curly brackets
Regex remove hash comments exclude curly brackets

Time:05-04

I want to remove comments, strings after #, but with a special case where # is not considered a comment if it is inside curly braces (it is only when # appears before curly braces).

Input:

    This is 1. # Comment 1.
  This is 2. # Comment 2

  #Comment 3 {#Commit4}
  This is 3.

  # Commit5

  This is 4.{#This is 5} # Commit6

    #Commit7

  This is 6.

  # Commit8 #Commit9
  #Commit10
  # Commit11
  #Commit12 #Commit13
  {  # This is 7}; { # This is 8 } # Commit14
  {# This is 9}; { # This is 10 }={# This is 11} {# This is 12}={# This is 13}# Commit15
  
  # Commit16 {# Commit17   }

Output:

This is 1.
This is 2.
This is 3.
This is 4.{#This is 5} 
This is 6.
{  # This is 7}; { # This is 8 }
{# This is 9}; { # This is 10 }={# This is 11} {# This is 12}={# This is 13}

I want to implement the sub function provided by python3 built-in re module and provide my sample code, but I can't remove all # (by my definition)

reStr = str(input)
reStr = re.sub(r"((^|\n)(.*[^{])?)(#[^\n] )", r'\1', reStr)
print (f"{reStr}")

If you have a better solution please let me know, Thanks

CodePudding user response:

You can use the regex pattern (#[^{}\n] )($|{) and replace with \2. see https://regex101.com/r/grDMqH/1

CodePudding user response:

Using the fact that overlapping matches aren't allowed, here is one way to do so:

({.*?})|#.*

Replace with: \1

See the online demo here.

  • Related