Home > database >  Use regex to find a repeated pattern in python
Use regex to find a repeated pattern in python

Time:01-03

My question is, can i repeat the pattern of interest while using regex? For example i am looking for words in a file (each line is just a word so it makes it easier) that contain only consonant followed by a vowel, and that can happen many times. This means 'banana' is allowed but 'bananas', 'banaana', 'bananna' and so on, is not allwed. Also 'ba' is allowed, so is 'bana' and so onetc. Basicly the pattern i want to repeat is :

[bcdfghjklmnpqrstvwxyz]{1}[aeiouy]{1}

What i did was this (the pattern is the same as above but with greek letters)

import re
def f(x):
    res_count = 0
    regex_list = ['^[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}$',
                  '^[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}$',
                  '^[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}$',
                  '^[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}$',
                  '^[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}$',
                  '^[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}$',
                  '^[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}$',
                  '^[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}$',
                  '^[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}[βγδζθκλμνξπρστφχψ]{1}[αεηιουω]{1}$',
                  ]
    with open(x) as greek_words:
        for words in greek_words:
            for w_pat in regex_list:
                result = re.findall(w_pat,words)
                if result:
                    res_count  = 1
                    corrected = str(result).strip('[]\'')
                    with open('easy_words_for_children.txt', 'a') as g:
                        g.write(f'{corrected}\n')
                    result = False
        return res_count 
f('words_greek_normalized.txt')

So i am just manually repeating the intended pattern but i wanted to see if there is another way to get the same output. The rest is just to write the results in another file.

CodePudding user response:

You're just looking to repeat a pattern, so this works:

import re

# first two match, the rest don't
some_words = ['banana', 'cola', 'cocoa', 'hear', 'agape', 'letter']

# y is not technically a vowel, but that's not an issue here
expression = '^(?:[bcdfghjklmnpqrstvwxyz][aeiouy]) $'

for word in some_words:
    if re.match(expression, word):
        print(word)

Output:

banana
cola

So, just wrapping the matched text that needs to be repeated in (?:..) . The means "once or more times", the parentheses just group what you're repeating and the ?: means you're interesting in the grouping, but not in capturing the grouped part separately - you just want to match the whole thing.

Note that you don't need the {1} - the default is to match it just once unless you tell the regex engine otherwise.

CodePudding user response:

Find from beginning to end one consonant and one vowel, repeated one or more times.

>>> s1 = "banana"
>>> s2 = "baanana"
>>> re.match(r'\A(?:[bcdefghjklmnpqrstvwxyz][aeiou]) \Z', s1)
<re.Match object; span=(0, 6), match='banana'>
>>> re.match(r'\A(?:[bcdefghjklmnpqrstvwxyz][aeiou]) \Z', s2)

For Greek, use r'\A(?:[βγδζθκλμνξπρστφχψ][αεηιουω]) \Z'.

Or allowing for an optional trailing consonant:

>>> cons = 'bcdefghjklmnpqrstvwxyz'
>>> vow = 'aeiou'
>>> re.match(rf'\A(?:[{cons}][{vow}]) [{cons}]?\Z', 'bananas')
<re.Match object; span=(0, 7), match='bananas'>
  • Related