I have an aligment result with multiple sequences as text file. I want to split each result into new text file. Far now I can detect each sequence with '>', and split into files. However, new text files writen without line that contains '>'.
with open("result.txt",'r') as fo:
start=0
op= ' '
cntr=1
# print(fo.readlines())
for x in fo.readlines():
# print(x)
if (x[0]== '>'):
if (start==1):
with open(str(cntr) '.txt','w') as opf:
opf.write(op)
opf.close()
op= ' '
cntr =1
else:
start=1
else:
if (op==''):
op=x
else:
op= op '\n' x
fo.close()
print('completed')
">P51051.1 RecName: Full=Melatonin receptor type 1B; Short=Mel-1B-R; Short=Mel1b receptor [Xenopus laevis] Length=152 " this is how I want to see as a beginning of each text file but they start as "receptor [Xenopus laevis] Length=152". How can I include from the beginning.
CodePudding user response:
You can do it like this:
with open("result.txt", encoding='utf-8') as fo:
for index, txt in enumerate(fo.read().split(">")):
if txt:
with open(f'{index}.txt', 'w') as opf:
opf.write(txt)
You should provide the encoding of the file e.g. utf-8
, no need to specify read r
, there is no need to close the file if you are using a context manager i.e. with
and you just need to use read
instead of readlines
to get a string then call split
on the string. I'm using enumerate
to get a counter as well as enumerate objects. And f-string as it is a better way for string concatenation.