Home > OS >  My Python Regex code isn't finding consecutive sets of characters
My Python Regex code isn't finding consecutive sets of characters

Time:10-04

I'm trying to code a program to find if there are three consecutive sets of double letters in a .txt file (E.G. bookkeeper). So far I have:

import re

text = open(r'C:\Users\Jimbo.Wimbo\Desktop\List.txt')

for line in text:
   
   x = re.finditer(r'((\w)\2) ', line)
   if True:
      print("Yes")
   Else:
      print("No")

List.txt has 5 words. There is one word with three consecutive sets of double letters right at the end, but it prints 5 "Yes"'s. What can I do to fix it using re and os?

CodePudding user response:

You don't need re.finditer(), you can just use re.search().

Your regexp is wrong, it will match at least 1 set of duplicate characters, not 3.

if True: doesn't do anything useful. It doesn't mean "if the last assignment was a truthy value". You need to test the result of the regexp search.

Use any() to test if the condition matches any line in the file. Your code will print Yes or No for each line in the file.

if any(re.search(r'((\w)\2)){3}', line) for line in text):
    print('Yes')
else:
    print('No')

CodePudding user response:

I think your regex is incorrect. A good way to check your regex is to use an online regex checker, and you can test your regex against any number of strings you provide.

Here is one possible solution to your query:

import re

text = open(r'C:\Users\Jimbo.Wimbo\Desktop\List.txt')

for line in text:
    x = len(re.findall(r'(.)\1', line))

    if x == 3:
        print(f"Found a word with 3 duplicate letters : {line}")
    else:
        print(f"Word: {line}, Duplicate letters : {x}")

Hope this helps.

  • Related