Home > database >  comparing values in two lists
comparing values in two lists

Time:12-06

I have two strings in which I want to compare every character to check if they are identical, and print number of non-matches. I've wrote a code but it prints the wrong number - '0' - it should be '7'. This is my code:

seq_1='GAGCCTACTAACGGGAT'
seq_2='CATCGTAATGACGGCCT'

for i, y in zip(seq_1, seq_2):
    s1=[]
    s2=[]
    s1.append(i)
    s2.append(y)

s1.sort()
s2.sort()

count=0
if s1!=s2:
    count =1

CodePudding user response:

A good option is to use enumerate() that provides you with the index.

for i, x in enumerate(zip(seq_1, seq_2)):
    if x[0] != x[1]:
        print(i, x)

Will print the line numbers along with the diff.

0 ('G', 'C')
2 ('G', 'T')
4 ('C', 'G')
7 ('C', 'A')
9 ('A', 'G')
14 ('G', 'C')
15 ('A', 'C')

CodePudding user response:

This has been answered several times on stack overflow.

Here is a solution:

numNonMatchingBasePairs = sum(c1 != c2 for c1, c2 in zip(seq_1, seq_2))

where c1 and c2 are your characters of your two strings seq_1 and seq_2. Notice, that this option most likely is the fastest of the current answers. (This might change...)

CodePudding user response:

@Gameplay there is no need for sorting the strings before hand, since they get sorted lexicographically/by their ordinal value and thus might even yield WRONG results, since we are not only looking for the number of occurences of bases in the strings but especially their position. By sorting, you effectively lose the position information of the amino acids between the two strains.

CodePudding user response:

You can try using map and sum:

seq_1='GAGCCTACTAACGGGAT'
seq_2='CATCGTAATGACGGCCT'
print(sum(map(lambda i:i[0]!=i[1],zip(seq_1,seq_2))))

# 7

CodePudding user response:

seq_1='GAGCCTACTAACGGGAT'
seq_2='CATCGTAATGACGGCCT'

non_matched = 0
for i, y in zip(seq_1, seq_2):
    if i!=y:
        non_matched =1

print(non_matched)
  • Related