Here's a string. I want to remove a C-style comments with the comments itself. Without using regex
a = "word234 /*12aaa12*/"
I want the output to be just:
word234
CodePudding user response:
Here is a simple algorithm that keep the state over 2 characters and uses a flag to keep or not the characters.
a = "word234 /*12aaa12*/ word123 /*xx*xx*/ end"
out = []
add = True
prev = None
for c in a:
if c == '*' and prev == '/':
if add:
del out[-1]
add = False
if c == '/' and prev == '*':
add = True
prev = c
continue
prev = c
if add:
out.append(c)
s2 = ''.join(out)
print(s2)
Output:
word234 word123 end
If you want to handle nested comments (not sure if this exists, but this is fun to do), the algorithm is easy to modify to use a flag that counts the depth level:
a = "word234 /*12aaa12*/ word123 /*xx/*yy*/xx*/ end"
out = []
lvl = 0
prev = None
for c in a:
if c == '*' and prev == '/':
if lvl == 0:
del out[-1]
lvl -= 1
if c == '/' and prev == '*':
lvl = 1
prev = c
continue
prev = c
if lvl == 0:
out.append(c)
s2 = ''.join(out)
print(s2)
CodePudding user response:
You can use str.find
to search for occurrences of /*
and */
in the string.
str.find
returns the indices of /*
and */
. str.find
returns -1
if it doesn't find /*
in the string. We can use that as a stop condition in a loop, searching for the next comment until there are no more comments.
Then, we can use these indices with str.join
to join all the non-comment substrings into one string.
def indices_c_comments(s):
yield 0
i = s.find('/*')
while i != -1:
j = s.find('*/', i)
yield from (i, j 2)
i = s.find('/*', j)
yield len(s)
def strip_c_comments(s):
g = indices_c_comments(s)
return ''.join(s[i:j] for i,j in zip(g, g))
for s in ('text/*comment*/text/*comment*/text', 'text/*comment*//*comment*/text', 'text/*comment*/', '/*comment*/'):
print('"{}" --> "{}"'.format(s, strip_c_comments(s)))
# "text/*comment*/text/*comment*/text" --> "texttexttext"
# "text/*comment*//*comment*/text" --> "texttext"
# "text/*comment*/" --> "text"
# "/*comment*/" --> ""