Sorry if the header is poorly worded. I have a large file full of data subsets, each with a unique identifier. I want to be able to find the first line containing the identifier and print that line along with every line after that one until the next data subset is reached (that line will start with "<"). The data is structured as shown below.
<ID1|ID1_x
AAA
BBB
CCC
<ID2|ID2_x
DDD
EEE
FFF
<ID3|ID3_x
...
I would like to print:
<(ID2)
DDD
EEE
FFF
So far I have:
with open('file.txt') as f:
for line in f:
if 'ID2' in line:
print(line)
...
CodePudding user response:
Try with the code below:
found_id = False
with open('file.txt') as f:
for line in f:
if '<ID' in line:
if '<ID2' in line:
id_line_split = line.split('|')
id_line = id_line_split[0][1:]
print('<(' str(id_line) ')')
found_id = True
else:
found_id = False
else:
if found_id == True:
# remove carriage return and line feed
line = line.replace('\n','')
line = line.replace('\r','')
print(line)
The execution of previous code in my system, with your file.txt
produces this output:
<(ID2)
DDD
EEE
FFF
Second question (from comment)
To select ID2
and ID23
(see questione in the comment of this answer), the program has been changed in this way:
found_id = False
with open('file.txt') as f:
for line in f:
if '<ID' in line:
if ('<ID2' in line) or ('<ID23' in line):
id_line_split = line.split('|')
id_line = id_line_split[0][1:]
print('<(' str(id_line) ')')
found_id = True
else:
found_id = False
else:
if found_id == True:
# remove carriage return and line feed
line = line.replace('\n','')
line = line.replace('\r','')
print(line)```