Home > Back-end >  Case-insensitive section of a pattern
Case-insensitive section of a pattern

Time:12-27

Does Python have something like vim where it allows inlining a portion of the pattern that may have flags, for example being case-insensitive? Here would be an example:

re.search(r'he\cllo', string)

\c being the case-insensitive inline indicator. Or is it an all or nothing in python with the re.I flag?

CodePudding user response:

Python has an atypical way of implementing the case-insensitive inline modifier.

  • It can be enabled globally, using (?i). This applies to the entire string...
    • ...except that if may be selectively disabled for a group: (?-i:...).
    • There is no "disable global" flag.
  • It can be enabled selectively for a group: (?i:...)
    • Anything outside of the group is not affected when the flag is applied inside the group.

Here are some examples.

Using the global flag:

q="HeLLo WorLD"

re.match(r"(?i)he(?-i:LL)o\swoRld", q) # this matches
re.match(r"(?i)he(?-i:ll)o\swoRld", q) # this doesn't match, since 'LL' != 'll'
re.match(r"(?i)he(?-i:Ll)o\swoRld", q) # nor does this, 'LL' != 'Ll'

This can be done multiple times. Only the characters enclosed in the groups will be treated as case-sensitive:

q="HeLLo WorLD"

re.match(r"(?i)he(?-i:LL)o\swo(?-i:r)ld", q) # this matches: 'LL' = 'LL' and 'r' == 'r'
re.match(r"(?i)he(?-i:LL)o\swo(?-i:R)ld", q) # but this doesn't, 'LL' == 'LL' but 'R' != 'r'

The global flag can be applied anywhere in the pattern, but anywhere other than the front is deprecated, and yields a DepricationWarning. As of Python 3.8 it does still work, and follows the same rules.

Here is the non-global method:

q="HeLLo WorLD"

re.match(r"(?i:h)eLLo\sWorLD", q) # matches, since only enabled for 'h'
re.match(r"(?i:h)eLLo\sworld", q) # doesn't match: flag only applies to the group

Some combinations are redundant, but the engine handles them fine:

q="HeLLo WorLD"

re.match(r"(?i:h)e(?-i:ll)o\sWorLD", q) # this fails; disabling the flag is redundant
re.match(r"(?i)(?i:h)e(?-i:LL)o\sworld", q) # this matches, but enabling the flag in the first group is redundant, since it's enabled globally

Note: tested on python 3.8. I think older versions may have handled this slightly differently.

  • Related