I am reading a text file using Python but I would like to delete any text within the txt file that c-CodePudding

I am reading a text file using Python but I would like to delete any text within the txt file that comes between /* and /*

That is how I am starting my code:

import json
record = []
with open('txtfile.txt, 'r') as f:
   for line in f:

my text file begins with:

/*
Changelog:
2019-11-19: Modification reading tickets 25/1
2015-02-22: ticket number 001433
/*

I don't want to read those lines before continuing my task. I would like to remove any text that comes in between /* and /* if any is found.

CodePudding user response：

This code will open the file and print all the non-comment lines. If you'd like them written to a file you can open a file for writing and do a write instead of a print.

I've assumed you only have cases like in your example text - the comments are surrouned by /* and */ which are each on a line of their own. This approach will not work for comments like /* comment stuff */, but it could be extended to support them.

with open("txtfile.txt") as f:
    lines_iter = iter(f)
    
    try:
        while True:
            line = next(lines_iter)

            if line.strip() != "/*":
                print(line, end="")

            else:  # In the comment block.
                while next(lines_iter).strip() != "*/":
                    pass

    except StopIteration:
        pass

CodePudding user response：

This assumes that there is only ever (at most) one pair of /* delimiters.

Read the entire file into memory. Look for 1st delimiter. If found, look for 2nd delimiter. If found reconstruct text using traditional slicing techniques

Assume the file contents are:

Banana
/*
Changelog:
2019-11-19: Modification reading tickets 25/1
2015-02-22: ticket number 001433
/*
Hello world
Goodbye cruel world
The end

Then:

FILENAME = 'txtfile.txt'
DELIMITER = '/*'

with open(FILENAME) as f:
    data = f.read()
    while (p1 := data.find(DELIMITER)) >= 0:
        if (p2 := data.find(DELIMITER, p1 len(DELIMITER))) >= 0:
            data = data[:p1]   data[p2 len(DELIMITER) 1:]
        else:
            break
    print(data)

Output:

Banana
Hello world
Goodbye cruel world
The end

CodePudding user response：

Interesting problem. I have tried using sort of a toggle that skips rows after the /*. Please refer to the inline comments for details:

with open(r"C:\Temp\my_text.log") as log:
    # init toggle
    toggle = 1
    for row in log:
        # if row delimiter is found 'toggle off' the variable
        # to ignore subsequent rows until it is encountered again. Then
        # toggle it on.
        if row.strip() == "/*":
            toggle = (not toggle)
        # if is toggled make sure the row is not the delimiter itself
        if toggle and row.strip() != "/*":
            print(row.strip())

CodePudding user response：

Here is a little JSON parser to play with. It knows if it is parsing in a string or not. Not a ready to use solution, but all you need is to remove parts beginnig one char before the transition to ML_COMMENT and ending at position where the transition from ML_COMMENT occurs.

import enum

State = enum.Enum('State', 'JSON STRING ESCAPE SLASH ML_COMMENT ML_COMMENT_END')

TRANS = { 
    State.JSON: {
        '/': State.SLASH,
        '"': State.STRING,
    },  
    State.STRING: {
        '\\': State.ESCAPE,
        '"': State.JSON,
    },  
    State.ESCAPE: {
        'default': State.STRING,
    },  
    State.SLASH: {
        '*': State.ML_COMMENT,
        'default': State.JSON,
    },  
    State.ML_COMMENT: {
        '*': State.ML_COMMENT_END,
    },  
    State.ML_COMMENT_END: {
        '/': State.JSON,
        '*': State.ML_COMMENT_END,
        'default': State.ML_COMMENT,
    },  
}

def strip(json_str):
    state = State.JSON
    for ch in json_str:
        trans = TRANS[state]
        try:
            new = trans[ch]
        except KeyError:
            new = trans.get('default', ()) 
        if isinstance(new, State) and new != state:
            print(f"{ch} {state} -> {new}")
            state = new 
        else:
            print(ch)

TEST=r'[10/*comment*/, /***also comment***/ "/*not comment*/", "abc\"/*still in string*/",null]'

strip(TEST)