I am reading a text file using Python but I would like to delete any text within the txt file that comes between /* and /*
That is how I am starting my code:
import json
record = []
with open('txtfile.txt, 'r') as f:
for line in f:
my text file begins with:
/*
Changelog:
2019-11-19: Modification reading tickets 25/1
2015-02-22: ticket number 001433
/*
I don't want to read those lines before continuing my task. I would like to remove any text that comes in between /* and /* if any is found.
CodePudding user response:
This code will open the file and print all the non-comment lines. If you'd like them written to a file you can open a file for writing and do a write instead of a print.
I've assumed you only have cases like in your example text - the comments are surrouned by /*
and */
which are each on a line of their own. This approach will not work for comments like /* comment stuff */
, but it could be extended to support them.
with open("txtfile.txt") as f:
lines_iter = iter(f)
try:
while True:
line = next(lines_iter)
if line.strip() != "/*":
print(line, end="")
else: # In the comment block.
while next(lines_iter).strip() != "*/":
pass
except StopIteration:
pass
CodePudding user response:
This assumes that there is only ever (at most) one pair of /* delimiters.
Read the entire file into memory. Look for 1st delimiter. If found, look for 2nd delimiter. If found reconstruct text using traditional slicing techniques
Assume the file contents are:
Banana
/*
Changelog:
2019-11-19: Modification reading tickets 25/1
2015-02-22: ticket number 001433
/*
Hello world
Goodbye cruel world
The end
Then:
FILENAME = 'txtfile.txt'
DELIMITER = '/*'
with open(FILENAME) as f:
data = f.read()
while (p1 := data.find(DELIMITER)) >= 0:
if (p2 := data.find(DELIMITER, p1 len(DELIMITER))) >= 0:
data = data[:p1] data[p2 len(DELIMITER) 1:]
else:
break
print(data)
Output:
Banana
Hello world
Goodbye cruel world
The end
CodePudding user response:
Interesting problem. I have tried using sort of a toggle that skips rows after the /*
. Please refer to the inline comments for details:
with open(r"C:\Temp\my_text.log") as log:
# init toggle
toggle = 1
for row in log:
# if row delimiter is found 'toggle off' the variable
# to ignore subsequent rows until it is encountered again. Then
# toggle it on.
if row.strip() == "/*":
toggle = (not toggle)
# if is toggled make sure the row is not the delimiter itself
if toggle and row.strip() != "/*":
print(row.strip())
CodePudding user response:
Here is a little JSON parser to play with. It knows if it is parsing in a string or not. Not a ready to use solution, but all you need is to remove parts beginnig one char before the transition to ML_COMMENT
and ending at position where the transition from ML_COMMENT
occurs.
import enum
State = enum.Enum('State', 'JSON STRING ESCAPE SLASH ML_COMMENT ML_COMMENT_END')
TRANS = {
State.JSON: {
'/': State.SLASH,
'"': State.STRING,
},
State.STRING: {
'\\': State.ESCAPE,
'"': State.JSON,
},
State.ESCAPE: {
'default': State.STRING,
},
State.SLASH: {
'*': State.ML_COMMENT,
'default': State.JSON,
},
State.ML_COMMENT: {
'*': State.ML_COMMENT_END,
},
State.ML_COMMENT_END: {
'/': State.JSON,
'*': State.ML_COMMENT_END,
'default': State.ML_COMMENT,
},
}
def strip(json_str):
state = State.JSON
for ch in json_str:
trans = TRANS[state]
try:
new = trans[ch]
except KeyError:
new = trans.get('default', ())
if isinstance(new, State) and new != state:
print(f"{ch} {state} -> {new}")
state = new
else:
print(ch)
TEST=r'[10/*comment*/, /***also comment***/ "/*not comment*/", "abc\"/*still in string*/",null]'
strip(TEST)