Home > Software design >  boolean valued function in python
boolean valued function in python

Time:10-19

The function (filteredFastaToYear) uses another boolean-valued function (filterHeaderToYear) which does not work. I would be very grateful for your help:

I have the following task:

I give to my first function (filteredFastaToYear) two lists:

sequences_inp = ['ABC', 'DEF', 'GHI', 'JKL', 'MNO']  
headers_inp = ['2019-9', '2021-2', '2020-1', '2021-5', '2021-8']

As output I need only headers from the year 2021 and the belonging sequences:

sequences_out = ['DEF', 'JKL', 'MNO']  
headers_out = ['2021', '2021', '2021']

I type:

sequences_out, headers_out = filteredFastaToYear('2021', sequences_inp, headers_inp)
print(len(sequences_out), len(headers_out))

But insted of expected output I get empty lists:

output: 0 0
expected output: 3 3

Function filteredFastaToYear: creates two filtered lists

def filteredFastaToYear(year, listOfSequences, listOfHeaders):

    """ output filtered sequence list, header list """
    filtListOfSequences = []
    filtListOfHeaders = []

    """ fasta filtering """
    for i in range(0, len(listOfHeaders)-1):
        if filterHeaderToYear(year, listOfHeaders[i]) == year:
            filtListOfSequences.append(listOfSequences[i])
            filtListOfHeaders.append(listOfHeaders[i])

    return filtListOfSequences, filtListOfHeaders

Function filterHeaderToYear: choose header from a required year:

def filterHeaderToYear(year, listOfHeaders):

    """ split header, find the needed year """
    for header in listOfHeaders:
        header_split = header.split('-')
        if header_split[0] == year:
            return True

    return False

CodePudding user response:

You could save yourself the bug hunt and do it like this:

[ (s, h[:4]) for for s, h in zip(sequences_inp, headers_inp) if h[:4] == '2021']

CodePudding user response:

You are splitting the input year itself in code

    for header in listOfHeaders:
        header_split = header.split('-')

This will split 2021-19 to [2,0,2,1,-,1,9] Also you have put a check at if filterHeaderToYear(year, listOfHeaders[i]) == year which will never excecute as method return True or False and you are comparing with year

Also you are not iterating the full list by doing

for i in range(0, len(listOfHeaders) - 1)

It will stop one position before the last one.

Try this code

def filteredFastaToYear(year, listOfSequences, listOfHeaders):
    """ output filtered sequence list, header list """
    filtListOfSequences = []
    filtListOfHeaders = []

    """ fasta filtering """
    for i in range(0, len(listOfHeaders)):
        if filterHeaderToYear(year, listOfHeaders[i]):
            filtListOfSequences.append(listOfSequences[i])
            filtListOfHeaders.append(listOfHeaders[i])

    return filtListOfSequences, filtListOfHeaders

def filterHeaderToYear(year, listOfHeaders):

    """ split header, find the needed year """
    header_split = listOfHeaders.split('-')
    if header_split[0] == year:
        return True

    return False

sequences_inp = ['ABC', 'DEF', 'GHI', 'JKL', 'MNO']
headers_inp = ['2019-9', '2021-2', '2020-1', '2021-5', '2021-8']

sequences_out, headers_out = filteredFastaToYear('2021', sequences_inp, headers_inp)
print(len(sequences_out), len(headers_out)) # 3,3

  • Related