Using a dictionary I would like to have the string sequence "ACCTAGCCCTA" as {'AC': 1, 'CC': 3, 'CT': 2, 'TA': 2, 'AG': 1, 'GC': 1}
but my code gives me {'AC': 1, 'CC': 2, 'CT': 2, 'TA': 2, 'AG': 1, 'GC': 1}
, only CC 2.
p = input(str("entrez une chaine de caractere"))
dic = {p[i:i 2] : p.count(p[i:i 2] ) for i in range(len(p)-1)}
print(dic)
Why is that and what do I need to change?
CodePudding user response:
Another approach with zip
Counter
,
In [1]: from collections import Counter
In [2]: s = 'ACCTAGCCCTA'
In [3]: dict(Counter(map(lambda x: x[0] x[1], zip(s,s[1:]))))
Out[3]: {'AC': 1, 'CC': 3, 'CT': 2, 'TA': 2, 'AG': 1, 'GC': 1}
CodePudding user response:
For an efficient solution, use collections.Counter
and a generator expression:
s = 'ACCTAGCCCTA'
from collections import Counter
out = dict(Counter(s[i:i 2] for i in range(len(s)-1)))
Output:
{'AC': 1, 'CC': 3, 'CT': 2, 'TA': 2, 'AG': 1, 'GC': 1}
variant without import
out = {}
for i in range(len(s)-1):
out[s[i:i 2]] = out.get(s[i:i 2], 0) 1
CodePudding user response:
There's actually an itertools recipe for this, but you have to install it:
In [41]: from more_itertools import sliding_window
In [42]: s = "ACCTAGCCCTA"
In [43]: from collections import Counter
In [44]: Counter(sliding_window(s,2))
Out[44]:
Counter({('A', 'C'): 1,
('C', 'C'): 3,
('C', 'T'): 2,
('T', 'A'): 2,
('A', 'G'): 1,
('G', 'C'): 1})
In [45]: Counter(sliding_window(s,4))
Out[45]:
Counter({('A', 'C', 'C', 'T'): 1,
('C', 'C', 'T', 'A'): 2,
('C', 'T', 'A', 'G'): 1,
('T', 'A', 'G', 'C'): 1,
('A', 'G', 'C', 'C'): 1,
('G', 'C', 'C', 'C'): 1,
('C', 'C', 'C', 'T'): 1})