I have this program to generate random N sequences and find the GC content.
import random
def randseq(abc, length):
return "".join([random.choice(abc) for i in range(random.randint(1, length))])
N = 2
longest_seq = ""
shortest_seq = randseq("ATCG", 10)
for i in range(N):
print(f'Sequence {i 1}):')
seq = randseq("ATCG", 10)
if len(seq) > len(longest_seq):
longest_seq = seq
if len(seq) < len(shortest_seq):
shortest_seq = seq
totalG = seq.count("G")
totalC = seq.count("C")
GCcontent = totalG totalC
print(seq)
print("The GC content is:", GCcontent)
This is the output:
Sequence 1):
TCGGTG
Sequence 2):
GCATCGTCAA
The GC content is: 5
When I print the GC content, it does not make sense. The content should be: Cs = 4 Gs = 5, Total = 9. What's wrong with the code? Also how can I show the result of sequences
in a dictionary? for example: Sequence 1: {A:0, T:2, C:1, G:3}
CodePudding user response:
Code correction plus output of counts as requested.
import random
from collections import Counter
def randseq(abc, length):
return "".join([random.choice(abc) for i in range(random.randint(1, length))])
N = 2
longest_seq = None
shortest_seq = None
GCcontent = 0
for i in range(N):
print(f'Sequence {i 1}):')
seq = randseq("ATCG", 10)
longest_seq = longest_seq or seq # set longest to seq if it is None
shortest_seq = shortest_seq or seq # sets shortest to seq if it is None
longest_seq = max(seq, longest_seq, key = len)
shortest_seq = min(seq, shortest_seq, key = len)
totalG = seq.count("G")
totalC = seq.count("C")
GCcontent = totalG totalC
print(f'\tSequence: {seq}')
print(f'\tCounts: {Counter(seq)}')
print()
print(f"The GC content is: {GCcontent}")
print(f"Longest sequence: {longest_seq}")
print(f"Shortest sequence: {shortest_seq}")
Example Run
Sequence 1):
Sequence: GCT
Counts: Counter({'G': 1, 'C': 1, 'T': 1})
Sequence 2):
Sequence: AACAATAC
Counts: Counter({'A': 5, 'C': 2, 'T': 1})
The GC content is: 4
Longest sequence: AACAATAC
Shortest sequence: GCT