Home > Mobile >  Regex pattern for a phrase that apears in multi-ways
Regex pattern for a phrase that apears in multi-ways

Time:06-17

My goal is to go over a text file and count the amount of time the phrase 'oh my god' is written. the phrase can appear in different ways like:'omg' 'oh-my-god', 'oh my god!'... I've tried this pattern but I miss some and it doesn't count all of them:

regex = re.compile(r'\b(omg|(oh[^A-Za-z0-9]my[^A-Za-z0-9]god)')

CodePudding user response:

You could write it like

\b(?:omg|oh([ -])my\1god)\b

The pattern matches:

  • \b a word boundary
  • (?: nonca pture group for the alternatives
    • omg match literally
    • | Or
    • oh
    • ([ -]) capture group 1, match either a space or -
    • my match literally
    • \1 backreference to match the same as group 1
    • god match literally
  • ) close group 1
  • \b a word boundary

Regex demo

CodePudding user response:

Regex is always difficult, but this should work for you. A helpful resource to test and hone Regex can be found here: https://pythex.org/

I've done this solution such that you can use it with either a dictionary of phrases or an entire block of text (i.e. a string).

    # Python 3.0 
    import re
    
        
    target_string = "oh my god omg oh-my-god oh my god! oh my god! oh my god Oh my god OMG Oh-my-god Oh my god!" \
              "Oh my god! Oh my god Oh My God OmG Oh-My-God Oh My God! Oh My god! Oh My God the ggod game game god godohmygod 132 !@#$%^&*()"

    # Dictionary of phrases you want to search
    dictionary = ['oh my god', 'omg', 'oh-my-god', 'oh my god!', 'oh, my god!' 'oh, my god', 'Oh my god', 'OMG', 'Oh-my-god',
           'Oh my god!',
           'Oh, my god!' 'Oh, my god', 'Oh My God', 'OmG', 'Oh-My-God', 'Oh My God!', 'Oh, My god!' 'Oh, My God',
           'the ggod game', 'game', 'god godohmygod', '132', '!@#$%^&*()']


    #Loop through the dictionary and print phrases that matches the regular expression
    def match_phrase():
        for p in dictionary:
        regex = re.compile(
            r"(?:oh|o|O)(?: |-|,|!|\.)*(?:my|m'y|m|M)*(?: |-|,|!|\.)*(?:god|g-o-d|GOD|g.o.d|God|gOD|GoD|g|G)(?: |-|,|!|\.)*")
        if regex.match(p):
            print("Matching words in dictionary: ",p)

    #Loop through a string of text and return all matching results
    def match_text_string():
    
        regex = re.compile(
        r"(?:oh|o|O)(?: |-|,|!|\.)*(?:my|m'y|m|M)*(?: |-|,|!|\.)*(?:god|g-o-d|GOD|g.o.d|God|gOD|GoD|g|G)(?: |-|,|!|\.)*")

    result = re.findall(regex, target_string)

    # print the matching word using group() method
    print("Matching words in target_string: ", result)


if __name__ == '__main__':
    match_phrase()
    match_text_string()
  • Related