Home > Back-end >  Text parsing with specific element Python
Text parsing with specific element Python

Time:12-28

I have an aligment result with multiple sequences as text file. I want to split each result into new text file. Far now I can detect each sequence with '>', and split into files. However, new text files writen without line that contains '>'.

with open("result.txt",'r') as fo:
    start=0
    op= ' '
    cntr=1
    # print(fo.readlines())
    for x in fo.readlines():
        # print(x)
        if (x[0]== '>'):
            if (start==1):
                with open(str(cntr) '.txt','w') as opf:
                    opf.write(op)
                    opf.close()
                    op= ' '
                    cntr =1
            else:
                start=1   
        else:
            if (op==''):
                op=x
            else:
                op= op   '\n'   x
    fo.close()
    print('completed') 

">P51051.1 RecName: Full=Melatonin receptor type 1B; Short=Mel-1B-R; Short=Mel1b receptor [Xenopus laevis] Length=152 " this is how I want to see as a beginning of each text file but they start as "receptor [Xenopus laevis] Length=152". How can I include from the beginning.

CodePudding user response:

You can do it like this:

with open("result.txt", encoding='utf-8') as fo:
    for index, txt in enumerate(fo.read().split(">")):
        if txt:
            with open(f'{index}.txt', 'w') as opf:
                opf.write(txt)

You should provide the encoding of the file e.g. utf-8, no need to specify read r, there is no need to close the file if you are using a context manager i.e. with and you just need to use read instead of readlines to get a string then call split on the string. I'm using enumerate to get a counter as well as enumerate objects. And f-string as it is a better way for string concatenation.

  • Related