Home > database >  Python- adding up the elements in for loop based on dictionary
Python- adding up the elements in for loop based on dictionary

Time:10-13

What should I add or change to my code below in order to get a function that finds the mean length of reads??? I have to write a function, mean_length, that takes one argument: A dictionary, in which keys are read names and values are read sequences. The function must return a float, which is the average length of the sequence reads. Hope someone can help me :D I am very new to coding in python.

read_map = {'Read1': 'GGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTCGTCCAGACCCCTAGC',
'Read3': 'GTCTTCAGTAGAAAATTGTTTTTTTCTTCCAAGAGGTCGGAGTCGTGAACACATCAGT',
'Read2': 'CTTTACCCGGAAGAGCGGGACGCTGCCCTGCGCGATTCCAGGCTCCCCACGGG',
'Read5': 'CGATTCCAGGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTC',
'Read4': 'TGCGAGGGAAGTGAAGTATTTGACCCTTTACCCGGAAGAGCG',
'Read6': 'TGACAGTAGATCTCGTCCAGACCCCTAGCTGGTACGTCTTCAGTAGAAAATTGTTTTTTTCTTCCAAGAGGTCGGAGT'}

def mean_lenght (read_map):
    print('keys : ',read_map.values())
    for key in read_map.keys():
        print(key) 
    #result = sum(...?)/len(read_map)
    return result
print(mean_lenght(read_map))

CodePudding user response:

A very one-line simple solution could be:

def mean_length(read_map):
    return sum([len(v) for v in read_map.values()]) / len(read_map)

Basically, you construct a list of elements, each storing the length of an entry of read_map. Then, you sum up all those lengths and divide by the number of entries in your dict.

If your dictionary is very big, then constructing a list might not be the most memory efficient way. In this case:

def mean_length(read_map):
    mean = 0
    for v in read_map.values(): mean  = len(v)
    mean /= len(read_map)
    return mean

In this way, you do not build any intermediate list.

CodePudding user response:

The mean is the sum of lengths divided by the number of values, so let's just do this:

sum(map(len, read_map.values()))/len(read_map)

output: 55.333

Breakdown:

# "…" denotes the output of the previous line
read_map.values()  ->  returns the values of the dictionary
map(len, …)  -> computes the length of each sequence
sum(…)  -> get the total length
sum(…)/len(read_map)  -> divide the total length by the number of sequence = mean

As a function:

def mean_length(d):
    return sum(map(len, d.values()))/len(d)
>>> mean_length(read_map)
53.33

CodePudding user response:

get a function that finds the mean length of reads???

python built-in module statistics has what you are looking for statistics.mean. Naturally you need to find lengths before feeding data in said function, for which len built-in function is useful.

import statistics
read_map = {'Read1': 'GGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTCGTCCAGACCCCTAGC',
'Read3': 'GTCTTCAGTAGAAAATTGTTTTTTTCTTCCAAGAGGTCGGAGTCGTGAACACATCAGT',
'Read2': 'CTTTACCCGGAAGAGCGGGACGCTGCCCTGCGCGATTCCAGGCTCCCCACGGG',
'Read5': 'CGATTCCAGGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTC',
'Read4': 'TGCGAGGGAAGTGAAGTATTTGACCCTTTACCCGGAAGAGCG',
'Read6': 'TGACAGTAGATCTCGTCCAGACCCCTAGCTGGTACGTCTTCAGTAGAAAATTGTTTTTTTCTTCCAAGAGGTCGGAGT'}
print(statistics.mean(len(v) for v in read_map.values()))

output

55.333333333333336

CodePudding user response:

def mean_length(read_map):
    total_chars = 0
    for key in read_map.values():
        total_chars = total_chars   len(key)
    result = total_chars / len(read_map)
    return result

I think this is the most intuitive code for beginner.

  • Related