Home > OS >  Simple DNA pattern matching
Simple DNA pattern matching

Time:06-18

I have a simple code for finding multiple pattern matching from a certain text (specifically for DNA sequence pattern matching)

text = "CTGATTCC"
pattern = "ATT", "CT", "TTT"
set = []
for x in pattern:
    for i in range(0, len(text)-len(x) 1):
        if text[i:(i len(x))] == x:
            set.append(i 1)
print("Positions:", set) 

The code is already set, but I want to try if I can make the code print "Pattern is not found" for pattern that can't be found in the text. I can't figure out where to put it inside the loop. Any ideas would really help!

CodePudding user response:

You can use a temporary list for each pattern, then verifying whether or not you found it

values = []
for x in pattern:
    sub_values = []
    for i in range(0, len(text) - len(x)   1):
        if text[i:i   len(x)] == x:
            sub_values.append((x, i   1))
    if sub_values:
        values.extend(sub_values)
    else:
        print("Pattern", x, "not found")
Pattern TTT not found
Positions: [('ATT', 4), ('CT', 1)]

CodePudding user response:

We could use a regex approach here. We can form an alternation of all base pair sequences you want to find. Then use re.findall to find all matches. Whatever patterns were not matched can then be obtained using set arithmetic.

text = "CTGATTCC"
patterns = ["ATT", "CT", "TTT"]
regex = r'('   r'|'.join(patterns)   r')'
matches = re.findall(regex, text)
misses = set(patterns) - set(matches)
print(misses)  # set(['TTT'])

CodePudding user response:

You could do something like:

text = "CTGATTCC"
pattern = "ATT", "CT", "TTT"
set = []
match = False

for x in pattern:
    if match: match = False

    for i in range(0, len(text)-len(x) 1):
        if text[i:(i len(x))] == x:
            set.append(i 1)
            if not match: match = True

    if not match:
        print(f"Pattern not found: {x}")

set = set if len(set) > 0 else "None"
print("Positions: ", set) 

This will also output

Positions: None

if nothing is appended to set.

CodePudding user response:

After finished the second loop, if the array "set" not get incremented it means that "x" pattern wasn't found.

text = "CTGATTCC"
pattern = "ATT", "CT", "TTT"
set = []
for x in pattern:
    sLen = len(set)
    for i in range(0, len(text)-len(x) 1):
        if text[i:(i len(x))] == x:
            set.append(i 1)
    if len(set) == sLen:
        print("Not Found: ", x)
print("Positions:", set) 

Your code anyway have some issue because what happens if the pattern is bigger than the text, or if the text/pattern are empty that also is a not found case

  • Related