Home > Back-end >  Creating a list of positions of a substring within a string (DNA) (Python 3)
Creating a list of positions of a substring within a string (DNA) (Python 3)

Time:10-15

I am doing a bioinformatics course and I am trying to write a function to find all occurrences of a substring within a string.

def find_match(s, t):
  """Returns a list of all positions of a substring t in string s.

  Takes two arguments: s & t.
  """
  occurrences = []
  for i in range(len(s)-len(t) 1): # loop over alignment
    match = True
    for j in range(len(t)): # loop over characters
            if s[i j] != t[j]:  # compare characters
                match = False   # mismatch
                break
            if match:   # allchars matched
                occurrences.append(i)

  return(occurrences)
    

print(find_match("GATATATGCATATACTT", "ATAT")) # [1, 1, 1, 1, 3, 3, 3, 3, 5, 5, 9, 9, 9, 9, 11, 11, 11, 13]
print(find_match("AUGCUUCAGAAAGGUCUUACG", "U")) # [1, 4, 5, 14, 16, 17]

The output above should exactly match the following:

[2, 4, 10]

[2, 5, 6, 15, 17, 18]

How can I fix this? Preferably without using regular expressions.

CodePudding user response:

It looks like you badly indented the code, the

if match:

has to be outside of the inner cycle.

def find_match(s, t):
    """Returns a list of all positions of a substring t in string s.

      Takes two arguments: s & t.
    """
    occurrences = []
    for i in range(len(s)-len(t) 1): # loop over alignment
        match = True
        for j in range(len(t)): # loop over characters
            if s[i j] != t[j]:  # compare characters
                match = False   # mismatch
                break
        if match: # <--- This shouldn't be inside the inner for cycle
            occurrences.append(i   1)

    return occurrences
    

print(find_match("GATATATGCATATACTT", "ATAT")) # [1, 1, 1, 1, 3, 3, 3, 3, 5, 5, 9, 9, 9, 9, 11, 11, 11, 13]
print(find_match("AUGCUUCAGAAAGGUCUUACG", "U")) # [1, 4, 5, 14, 16, 17]

CodePudding user response:

You can do this with find,

def find_match(s, t):
    return list(set([s.find(t, i) 1 for i in range(len(s)-1) if s.find(t, i) != -1]))

Output:

In [1]: find_match("AUGCUUCAGAAAGGUCUUACG", "U")
Out[1]: [2, 5, 6, 15, 17, 18]

In [2]: find_match("GATATATGCATATACTT", 'ATAT')
Out[2]: [2, 10, 4]

Find will return the position of the substring. So iterate through the index and pass that into the str.find method. If the substring does not exist find will return -1. So it needs to filter out.

In [1]: "GATATATGCATATACTT".find('ATAT', 0)
Out[1]: 1
  • Related