Home > other >  Differences in result in two similar functions: finding the key with maximun value
Differences in result in two similar functions: finding the key with maximun value

Time:08-05

I am currently having an issue. Basically, I have 2 similar functions in terms of concept but the results do not align. These are the codes I learned from Bioinformatics I on Coursera.

The first code is simply creating a dictionary of occurrences of each k-mer pattern from a text (which is a long stretch of nucleotides). In this case, k is 5.

def FrequencyMap(text,k):
  freq ={}
  for i in range (0, len(text)-k 1):
    freq[text[i:i k]]=0
    for j in range (0, len(text)-k 1):
        if text[j:j k] == text[i:i k]:
            freq[text[i:i k]]  =1
  return freq, max(freq)

The text and the result dictionary are kinda long, but the main point is when I call max(freq), it returns the key 'TTTTC', which has a value of 1.

Meanwhile, I wrote another code that is simply based on the previous code to generate the 5-mer patterns that have the max values (number of occurrences in the text).

def FrequentWords(text, k):
  a = FrequencyMap(text, k)
  m = max(a.values())
  words = []
  for i in a:
    if a[i]==m:
        words.append(i)
  return words,m

And this code returns 'ACCTA', which has the value of 99, meaning it appears 99 times in the text. This makes total sense.

I used the same text and k (k=5) for both codes. I ran the codes on Jupyter Notebook. Why does the first one not return 'ACCTA'?

Thank you so much,

Here is the text, if anyone wants to try: "ACCATCCCTAGGGCATACCTAAGTCTACCTAAAAGGCTACCTAATACCATACCTAATTACCTAACTACCTAAAATAAGTCTACCTAATACCTAATACCTAAAGTTACCTAACGTACCTAATACCTAATACCTAACCACTACCTAATCCGATTTACCTAACAACCGATCGAGTACCTAATCGATACCTAAATAACGGACAATATACCTAATTACCTAATACCTAATACCTAAGTGTACCTAAGACGTCTACCTAATTGTACCTAACTACCTAATTACCTAAGATTAATACCTAATACCTAATTTACCTAATACCTAACGTGGACTACCTAATACCTAACTTTTCCCCTACCTAATACCTAACTGTACCTAAATACCTAATACCTAAGCTACCTAAAGAACAACATTGTACGTGCGCCGTACCTAAATACCTAACAACTACCTAACTGATACCTAATAGTGATTACCTAACGCTTCTACCTAACTACCTAAGTACCTAACGCTACCTAACTACCTAATGTCCACAAAATACCTAATACCTAATAGCTACCTAATTGTGTACCTAAGTACCTAACCTACCTAATAATACCTAAAAATACCTAAGTACCTAACGTACCTAAATTTTACCTAATCTACCTAACGTACCTAATACCTAATTATACCTAATTACCTAATGGTTACCTAAGTTACCTAATATGCCACTACCTAACCTTACCTAAGACCTACCTAATAGGTACCTAACTGGGTACCTAAGGCAGTTTACCTAATTCAGGGCTACCTAATGTACCTAATACCTAAGTACCTAATACCTAATCCCATACCTAATATTTACCTAAGGGCACCGGTACCTAATACCTAATACCTAATACCTAAACCTTCGTACCTAAATACCTAATCTACCTAATGTACCTAAGGTACCTAATACCTAAGTCACTACCTAATACCTAATACCTAATGGGAGGAGCTTACCTAAGGTTACCTAATTACCTAAATACCTAATCGTTACCTAA"

CodePudding user response:

Why does the first one not return 'ACCTA'?

Because max(freq) returns the maximum key of the dictionary. In this case the keys are strings (the k-mers), and strings are compared alphabetically. Hence the maximum one is the last string when the are sorted alphabetically.

If you want the first function to return the k-mer that occurs most often, you should change max(freq) to max(freq.items(), key=lambda key_value_pair: key_value_pair[1])[0]. Here, you are sorting the (kmer, count) pairs (that's the key_value_pair parameter of the lambda expression) based on the frequency and then selecting the kmer.

  • Related