Home > front end >  Explanation of a code about lineIndex , to collect reads from a file
Explanation of a code about lineIndex , to collect reads from a file

Time:01-03

Th aim is to build a graph from a collection of stings (reads) in a FASTQ file. First, we implement the following function that gets the reads. We remove the new line character from the end of each line (with str.strip()), and for convention, we convert all characters in the reads to uppper case (with str.upper()). The code for that:

def get_reads(filePath):
    reads = list() # The list of strings that will store the reads (the DNA strings) in the FASTQ file at filePath
    fastqFile = open(filePath, 'r') 
    fastqLines = fastqFile.readlines() 
    fastqFile.close()

    for lineIndex in range(1, len(fastqLines), 4): # I want this explained
        line = fastqLines[lineIndex]
        reads.append(line.strip().upper())
        
    return reads

Explain what is the purpose of the line for lineIndex in range(1, len(fastqLines), 4)?

We use this to make a de Bruijn graph from a collection of strings. Can someone explain, please?

CodePudding user response:

fastqLines is a Python List of each line read from the file. The loop from

for lineIndex in range(1, len(fastqLines), 4):

produces a value of lineIndex of 1, 5, 9 ... to the size of the List. This value is then used to store the selected lines in another List reads. Because Python Lists are indexed from 0, this all means that the 2nd, 6th, 10th lines from the file are stored in reads

  • Related