The function (filteredFastaToYear) uses another boolean-valued function (filterHeaderToYear) which does not work. I would be very grateful for your help:
I have the following task:
I give to my first function (filteredFastaToYear) two lists:
sequences_inp = ['ABC', 'DEF', 'GHI', 'JKL', 'MNO']
headers_inp = ['2019-9', '2021-2', '2020-1', '2021-5', '2021-8']
As output I need only headers from the year 2021 and the belonging sequences:
sequences_out = ['DEF', 'JKL', 'MNO']
headers_out = ['2021', '2021', '2021']
I type:
sequences_out, headers_out = filteredFastaToYear('2021', sequences_inp, headers_inp)
print(len(sequences_out), len(headers_out))
But insted of expected output I get empty lists:
output: 0 0
expected output: 3 3
Function filteredFastaToYear: creates two filtered lists
def filteredFastaToYear(year, listOfSequences, listOfHeaders):
""" output filtered sequence list, header list """
filtListOfSequences = []
filtListOfHeaders = []
""" fasta filtering """
for i in range(0, len(listOfHeaders)-1):
if filterHeaderToYear(year, listOfHeaders[i]) == year:
filtListOfSequences.append(listOfSequences[i])
filtListOfHeaders.append(listOfHeaders[i])
return filtListOfSequences, filtListOfHeaders
Function filterHeaderToYear: choose header from a required year:
def filterHeaderToYear(year, listOfHeaders):
""" split header, find the needed year """
for header in listOfHeaders:
header_split = header.split('-')
if header_split[0] == year:
return True
return False
CodePudding user response:
You could save yourself the bug hunt and do it like this:
[ (s, h[:4]) for for s, h in zip(sequences_inp, headers_inp) if h[:4] == '2021']
CodePudding user response:
You are splitting the input year itself in code
for header in listOfHeaders:
header_split = header.split('-')
This will split 2021-19
to [2,0,2,1,-,1,9]
Also you have put a check at if filterHeaderToYear(year, listOfHeaders[i]) == year
which will never excecute as method return True
or False
and you are comparing with year
Also you are not iterating the full list by doing
for i in range(0, len(listOfHeaders) - 1)
It will stop one position before the last one.
Try this code
def filteredFastaToYear(year, listOfSequences, listOfHeaders):
""" output filtered sequence list, header list """
filtListOfSequences = []
filtListOfHeaders = []
""" fasta filtering """
for i in range(0, len(listOfHeaders)):
if filterHeaderToYear(year, listOfHeaders[i]):
filtListOfSequences.append(listOfSequences[i])
filtListOfHeaders.append(listOfHeaders[i])
return filtListOfSequences, filtListOfHeaders
def filterHeaderToYear(year, listOfHeaders):
""" split header, find the needed year """
header_split = listOfHeaders.split('-')
if header_split[0] == year:
return True
return False
sequences_inp = ['ABC', 'DEF', 'GHI', 'JKL', 'MNO']
headers_inp = ['2019-9', '2021-2', '2020-1', '2021-5', '2021-8']
sequences_out, headers_out = filteredFastaToYear('2021', sequences_inp, headers_inp)
print(len(sequences_out), len(headers_out)) # 3,3