Remove part of the string between the comments along with the comments-CodePudding

Here's a string. I want to remove a C-style comments with the comments itself. Without using regex

a = "word234 /*12aaa12*/"

I want the output to be just:

word234

CodePudding user response：

Here is a simple algorithm that keep the state over 2 characters and uses a flag to keep or not the characters.

a = "word234 /*12aaa12*/ word123 /*xx*xx*/ end"

out = []
add = True
prev = None
for c in a:
    if c == '*' and prev == '/':
        if add:
            del out[-1]
        add = False
    if c == '/' and prev == '*':
        add = True
        prev = c
        continue
    prev = c
    if add:
        out.append(c)
s2 = ''.join(out)
print(s2)

Output:

word234  word123  end

If you want to handle nested comments (not sure if this exists, but this is fun to do), the algorithm is easy to modify to use a flag that counts the depth level:

a = "word234 /*12aaa12*/ word123 /*xx/*yy*/xx*/ end"

out = []
lvl = 0
prev = None
for c in a:
    if c == '*' and prev == '/':
        if lvl == 0:
            del out[-1]
        lvl -= 1
    if c == '/' and prev == '*':
        lvl  = 1
        prev = c
        continue
    prev = c
    if lvl == 0:
        out.append(c)
s2 = ''.join(out)
print(s2)

CodePudding user response：

You can use str.find to search for occurrences of /* and */ in the string.

str.find returns the indices of /* and */. str.find returns -1 if it doesn't find /* in the string. We can use that as a stop condition in a loop, searching for the next comment until there are no more comments.

Then, we can use these indices with str.join to join all the non-comment substrings into one string.

def indices_c_comments(s):
    yield 0
    i = s.find('/*')
    while i != -1:
        j = s.find('*/', i)
        yield from (i, j 2)
        i = s.find('/*', j)
    yield len(s)

def strip_c_comments(s):
    g = indices_c_comments(s)
    return ''.join(s[i:j] for i,j in zip(g, g))

for s in ('text/*comment*/text/*comment*/text', 'text/*comment*//*comment*/text', 'text/*comment*/', '/*comment*/'):
    print('"{}"  -->  "{}"'.format(s, strip_c_comments(s)))
# "text/*comment*/text/*comment*/text"  -->  "texttexttext"
# "text/*comment*//*comment*/text"  -->  "texttext"
# "text/*comment*/"  -->  "text"
# "/*comment*/"  -->  ""