Im still a very beginner coder and i have been using python to learn about regexes and output them into .txt files
This is what i have so far
{python bizarre_protein,echo=T,eval=T}
bizarre_protein = "MTLWARPSSKRGWYWHIRSSSHEEEGYFVWEEPSTLAVSFLYCWHIPSWHATSWHIRSSSRVADEGWRAPSPLYW"
import re
pattern = re.compile("[W][A-Z][A-Z][P|R|N][S]{1}")
for m in re.finditer(pattern, bizarre_protein):
print(m.start(),m.end(),m.group(0))
#start with pattern find W then add 2 A-Z, P|R|N and the S
some_protein = {"motif_start": [m.start(), m.start(), m.start(), m.start(), m.start()], "motif_sequence":[m.group(0), m.group(0), m.group(0), m.group(0), m.group(0)]}
text_lines = [ ]
text_line = "index\t"
for column in some_protein.keys():
text_line = text_line column "\t"
print(text_line)
text_lines.append(text_line)
for i in range(0,len(some_protein[column])):
text_line= str(i) "\t"
for column in some_protein.keys():
text_line = str(some_protein[column][i])
text_line = "\t"
print(text_line)
text_lines.append(text_line)
out_handle = open("bizarre_protein.txt","w")
for line in text_lines:
line = line.rstrip("\t")
print(line)
line = line "\n"
ignoreme = out_handle.write(line)
ignoreme = out_handle.close()
This is the result I get and it does output into the txt file I created but I need it to output all the rows (3, WARPS - 66, WRAPS) and not just the last one, I tried quite a few things but none of them have worked. how do I get it to list all of the rows instead of just the last one, thanks in advance
3 8 WARPS
14 19 WHIRS
29 34 WEEPS
43 48 WHIPS
53 58 WHIRS
66 71 WRAPS
#this is what i need in the txt file ^
index motif_start motif_sequence
0 66 WRAPS
1 66 WRAPS
2 66 WRAPS
3 66 WRAPS
4 66 WRAPS
#this is all i get^
CodePudding user response:
Is this the result you expect?
import re
bizarre_protein = "MTLWARPSSKRGWYWHIRSSSHEEEGYFVWEEPSTLAVSFLYCWHIPSWHATSWHIRSSSRVADEGWRAPSPLYW"
pattern = re.compile("W[A-Z]{1,2}[P|R|N]S")
with open("bizarre_protein.txt", "w") as f:
f.write("index\tmotif_start\tmotif_sequence\n")
for m in re.finditer(pattern, bizarre_protein):
print(m.start(), m.end(), m.group(0))
f.writelines("{}\t{}\t{}\n".format(m.start(), m.end(), m.group(0)))