What is the fastest way to replace certain characters in a given string other than using str.translate()
?
Given a sequence
that only consists of letters "A", "T", "G", and "C", I want to replace each instance of "A" with "T", "T" with "A", "C" with "G", and "G" with "C". To do this, I used an ascii dictionary map = {65:84,84:65,71:67,67:71}
, and do sequence.translate(map)
. However, in Python 3.8
this appears to be slow. I saw people mention using byte
or bytearray
to do this, but I just don't know how to make it work.
It looks like I first need to encode the sequence using sequence.encode('ascii', 'ignore')
and then use translate()
to do the translation?
Can anybody please help me?
For example,
sequence = 'ATGCGTGCGCGACTTT'
# {'A':'T', 'T':'A', 'C':'G', 'G':'C'}
map_dict = {65:84,84:65,71:67,67:71}
# expect 'TACGCACGCGCTGAAA'
sequence.translate(map_dict)
CodePudding user response:
Assumption here is the sequence is very long, then this should be O(1):
If you maintain an index which contains the position of each letter in the sequence, then you just need to update the index to do bulk replacements.
For example given seq = "AGCTTCGA"
index = {"A": {0, 7}, "G": {1, 6}, "C": {2, 5}, "T": {3, 4}}
and if I understand correctly you want to do a swap:
def swap(index, charA, charB):
tmp = index[charB]
index[charB] = index[charA]
index[charA] = tmp
swap(index, "A", "T")
print(index)
# {'A': {3, 4}, 'G': {1, 6}, 'C': {2, 5}, 'T': {0, 7}}
CodePudding user response:
I am going to assume that you want to just replace any occurrence of a string with another. Replace will not work in this case thank you for pointing this out but use:
for i in string:
match i:
case "A": i="T"
case "T": i="A"
case "C": i="G"
case "G": i="C"
continue