Home > Mobile >  Why my function is being executed only once?
Why my function is being executed only once?

Time:08-07

I'm trying to make a program which finds out the longest strand of certain type of subsequence in the DNA, and then write it down into corresponing sub_sequences dictionary value. The problem is that even though the loop executes 8 times (as much as i need), the function works only on the first execution. What's wrong here?

import sys
import csv

def main():

    # Check for command-line usage
    if len(sys.argv) != 3:
        exit()

    # Read database file into a variable
    database = open(sys.argv[1], "r")

    # Read DNA sequence file into a variable
    DNA_sequence = open(sys.argv[2], "r")

    # Makes a dictionary of kinds of sequences
    sub_sequences = {0: ["AGATC", 0], 1: ["TTTTTTCT", 0], 2: ["AATG", 0], 3: ["TCTAG", 0], 4: ["GATA", 0], 5: ["TATC", 0], 6: ["GAAA", 0], 7: ["TCTG", 0]}

    # Finds the longest match of each STR in DNA sequence

    sub_sequences_length = len(sub_sequences)

    for i in range(sub_sequences_length):
        sub_sequences[i][1] = longest_match(DNA_sequence.read(), sub_sequences[i][0])

    return


def longest_match(sequence, subsequence):
    """Returns length of longest run of subsequence in sequence."""

    # Initialize variables
    longest_run = 0
    subsequence_length = len(subsequence)
    sequence_length = len(sequence)

    # Check each character in sequence for most consecutive runs of subsequence
    for i in range(sequence_length):

        # Initialize count of consecutive runs
        count = 0

        # Check for a subsequence match in a "substring" (a subset of characters) within sequence
        # If a match, move substring to next potential match in sequence
        # Continue moving substring and checking for matches until out of consecutive matches
        while True:

            # Adjust substring start and end
            start = i   count * subsequence_length
            end = start   subsequence_length

            # If there is a match in the substring
            if sequence[start:end] == subsequence:
                count  = 1

            # If there is no match in the substring
            else:
                break

        # Update most consecutive matches found
        longest_run = max(longest_run, count)

    # After checking for runs at each character in seqeuence, return longest run found
    return longest_run


main()

CodePudding user response:

You call DNA_sequence.read() multiple times.

The documentation for TextIOBase::read says:

Read and return at most size characters from the stream as a single str. If size is negative or None, reads until EOF.

In other words, the first time you read the file, it reads the entire file, attemting to call read again, will result in reading nothing and you get an empty string back.


A simple fix is to store the contents of the file in a variable, and reuse that variable:

# Read DNA sequence file into a variable
with open(sys.argv[2], "r") as dna:
  DNA_sequence = dna.read()

Now you can use DNA_sequence variable which contains the contents of the file:

longest_match(DNA_sequence, sub_sequences[i][0])

CodePudding user response:

If you want your function will reoccur so you have to put your function in while loop and set 1,2 or any thing to run function and exit from function.

  • Related