I'm trying to make a program which finds out the longest strand of certain type of subsequence in the DNA, and then write it down into corresponing sub_sequences
dictionary value.
The problem is that even though the loop executes 8 times (as much as i need), the function works only on the first execution. What's wrong here?
import sys
import csv
def main():
# Check for command-line usage
if len(sys.argv) != 3:
exit()
# Read database file into a variable
database = open(sys.argv[1], "r")
# Read DNA sequence file into a variable
DNA_sequence = open(sys.argv[2], "r")
# Makes a dictionary of kinds of sequences
sub_sequences = {0: ["AGATC", 0], 1: ["TTTTTTCT", 0], 2: ["AATG", 0], 3: ["TCTAG", 0], 4: ["GATA", 0], 5: ["TATC", 0], 6: ["GAAA", 0], 7: ["TCTG", 0]}
# Finds the longest match of each STR in DNA sequence
sub_sequences_length = len(sub_sequences)
for i in range(sub_sequences_length):
sub_sequences[i][1] = longest_match(DNA_sequence.read(), sub_sequences[i][0])
return
def longest_match(sequence, subsequence):
"""Returns length of longest run of subsequence in sequence."""
# Initialize variables
longest_run = 0
subsequence_length = len(subsequence)
sequence_length = len(sequence)
# Check each character in sequence for most consecutive runs of subsequence
for i in range(sequence_length):
# Initialize count of consecutive runs
count = 0
# Check for a subsequence match in a "substring" (a subset of characters) within sequence
# If a match, move substring to next potential match in sequence
# Continue moving substring and checking for matches until out of consecutive matches
while True:
# Adjust substring start and end
start = i count * subsequence_length
end = start subsequence_length
# If there is a match in the substring
if sequence[start:end] == subsequence:
count = 1
# If there is no match in the substring
else:
break
# Update most consecutive matches found
longest_run = max(longest_run, count)
# After checking for runs at each character in seqeuence, return longest run found
return longest_run
main()
CodePudding user response:
You call DNA_sequence.read()
multiple times.
The documentation for TextIOBase::read
says:
Read and return at most size characters from the stream as a single str. If size is negative or None, reads until EOF.
In other words, the first time you read the file, it reads the entire file, attemting to call read
again, will result in reading nothing and you get an empty string back.
A simple fix is to store the contents of the file in a variable, and reuse that variable:
# Read DNA sequence file into a variable
with open(sys.argv[2], "r") as dna:
DNA_sequence = dna.read()
Now you can use DNA_sequence
variable which contains the contents of the file:
longest_match(DNA_sequence, sub_sequences[i][0])
CodePudding user response:
If you want your function will reoccur so you have to put your function in while loop and set 1,2 or any thing to run function and exit from function.