I have a text file in this format:
000000.png 712,143,810,307,0
000001.png 599,156,629,189,3 387,181,423,203,1 676,163,688,193,5
000002.png 657,190,700,223,1
000003.png 614,181,727,284,1
000004.png 280,185,344,215,1 365,184,406,205,1
I want to remove the lines that don't have a [number1,number2,number3,number4,1] or [number1,number2,number3,number4,5] ending and also strip the text line and remove the [blocks] -> [number1,number2,number3,number4,number5] that don't fulfill this condition.
The above text file should look like this in the end:
000001.png 387,181,423,203,1 676,163,688,193,5
000002.png 657,190,700,223,1
000003.png 614,181,727,284,1
000004.png 280,185,344,215,1 365,184,406,205,1
My code:
import os
with open("data.txt", "r") as input:
with open("newdata.txt", "w") as output:
# iterate all lines from file
for line in input:
# if substring contain in a line then don't write it
if ",0" or ",2" or ",3" or ",4" or ",6" not in line.strip("\n"):
output.write(line)
I have tried something like this and it didn't work obviously.
CodePudding user response:
No need for Regex, this might help you:
with open("data.txt", "r") as input: # Read all data lines.
data = input.readlines()
with open("newdata.txt", "w") as output: # Create output file.
for line in data: # Iterate over data lines.
line_elements = line.split() # Split line by spaces.
line_updated = [line_elements[0]] # Initialize fixed line (without undesired patterns) with image's name.
for i in line_elements[1:]: # Iterate over groups of numbers in current line.
tmp = i.split(',') # Split current group by commas.
if len(tmp) == 5 and (tmp[-1] == '1' or tmp[-1] == '5'):
line_updated.append(i) # If the pattern is Ok, append group to fixed line.
if len(line_updated) > 1: # If the fixed line is valid, write it to output file.
output.write(f"{' '.join(line_updated)}\n")