How to extract data from text file between two matched line in python-CodePudding

I have example text file of more that 35000 line in which there is a pattern like, how to write python code to extract data between two lines.

Violator was running
MaxSelect
Modified by Violator
some lines
some more lines
Violator was running
Code
fixed
Modified by Violator

I want to read the file and extract the data between Violator was running and Modified by Violator along with the line code and write these data to new output.txt file. I have the same string pattern of Violator throughout the text file just want to extract the data between them. Please help.

with open('example.txt', 'r') as rf:
   output = rf.readlines()
   s = len(output) - 1
   gen ="Violator was running"
   show = "Modified by Violator"
   for count, line in enumerate(rf,start=1):
      if re.match(gen, line) and re.match(show):
         print(rf.readlines())

This is what I haved tried

CodePudding user response：

You can loop through the lines to get the indexes of each starting point (Violator was running) and each ending point (Modified by Violator) and then get the lines in between the part of start & end index.

lines = [
"Violator was running",
"MaxSelect",
"Modified by Violator",
"some lines",
"some more lines",
"Violator was running",
"Code",
"fixed",
"Modified by Violator",
]

starts = []
ends = []

for idx, line in enumerate(lines):
    if line == "Violator was running":
        starts.append(idx)
    elif line == "Modified by Violator":
        ends.append(idx)
    else:
        continue

groups = []
for start, end in zip(starts, ends):
    group = lines[start 1:end]
    groups.append(group)
    
print(groups)

Output:

[['MaxSelect'], ['Code', 'fixed']]

CodePudding user response：

For simple tasks, I would recommend regex, but as you mentioned, this file is huge, and we should avoid loading it into memory.

Processing the file line-by-line is easy as others have mentioned, but you need to do the filtering yourself.

Quick - n - dirty solution, but a workable starting point:

with open("file_location") as infile:
   save_line = False
   out_lines = []
   for no, line in enumerate(infile):
      if line == "Violator was running\n":
         save_line = True
      elif line == "Modified by Violator\n":
         save_line = False
      elif save_line:
         out_lines.append(f"Line {no} - '{line[:-1]}'\n")
with open("out_file", "w") as outfile:
   for line in out_lines:
      outfile.write(line)

CodePudding user response：

I think this answer is more clear:

start = 'Violator was running'
end = 'Modified by Violator'
output = []

with open('text.txt') as f:
    lines = [line.rstrip() for line in f]

    for index, string in enumerate(lines):
        if start in string:
            for item in lines[index 1:]:
                if end not in item:
                    output.append(item)
                else:
                    break


with open('output.txt', 'a') as f:
    f.writelines(output)

CodePudding user response：

If you want to read file line by line you can do this.

with open(filename) as file:
for line in file:
    print(line.rstrip())

If you want to load file in memory and use line one by one, than you can use the following code

with open(filename) as file:
lines = file.readlines()
lines = [line.rstrip() for line in lines]