Home > Back-end >  GC content un Python
GC content un Python


I have this program to generate random N sequences and find the GC content.

import random

def randseq(abc, length):
    return "".join([random.choice(abc) for i in range(random.randint(1, length))])
N = 2
longest_seq = ""
shortest_seq = randseq("ATCG", 10)
for i in range(N):
    print(f'Sequence {i  1}):')
    seq = randseq("ATCG", 10)
    if len(seq) > len(longest_seq):
        longest_seq = seq
    if len(seq) < len(shortest_seq):
        shortest_seq = seq
    totalG = seq.count("G")
    totalC = seq.count("C")
    GCcontent = totalG   totalC

print("The GC content is:", GCcontent)

This is the output:

Sequence 1):


Sequence 2):


The GC content is: 5

When I print the GC content, it does not make sense. The content should be: Cs = 4   Gs = 5, Total = 9. What's wrong with the code? Also how can I show the result of sequences
in a dictionary? for example: Sequence 1: {A:0, T:2, C:1, G:3} 

CodePudding user response:

Code correction plus output of counts as requested.

import random
from collections import Counter

def randseq(abc, length):
    return "".join([random.choice(abc) for i in range(random.randint(1, length))])
N = 2
longest_seq = None
shortest_seq = None
GCcontent = 0
for i in range(N):
    print(f'Sequence {i  1}):')
    seq = randseq("ATCG", 10)
    longest_seq = longest_seq or seq       # set longest to seq if it is None
    shortest_seq = shortest_seq or seq     # sets shortest to seq if it is None
    longest_seq = max(seq, longest_seq, key = len)
    shortest_seq = min(seq, shortest_seq, key = len)
    totalG = seq.count("G")
    totalC = seq.count("C")
    GCcontent  = totalG   totalC
    print(f'\tSequence: {seq}')
    print(f'\tCounts: {Counter(seq)}')

print(f"The GC content is: {GCcontent}")
print(f"Longest sequence: {longest_seq}")
print(f"Shortest sequence: {shortest_seq}")

Example Run

 Sequence 1):
    Sequence: GCT
    Counts: Counter({'G': 1, 'C': 1, 'T': 1})

Sequence 2):
    Sequence: AACAATAC
    Counts: Counter({'A': 5, 'C': 2, 'T': 1})

The GC content is: 4
Longest sequence: AACAATAC
Shortest sequence: GCT
  • Related