Home > front end >  .NET character class subtraction support in Python
.NET character class subtraction support in Python

Time:10-26

I have this regex ("^[-A-Z0-9-[O]]{1,8}$") comming from a client requirement (normally should not be changed). But it doesn't work in python (it works in C).

from re import search

var = "MY01C0DE"
regex = "^[-A-Z0-9-[O]]{1,8}$"

print(search(regex, var))

this prints None.

But if I change the regex to "^[-A-NP-Z0-9]{1,8}$", this works.

from re import search

var = "MY01C0DE"
regex = "^[-A-NP-Z0-9]{1,8}$"

print(search(regex, var))

So basically the -[O] part doesn't work in python if I understand correctly. But I have checked and this regex works in C. Is there any way to make this way of excluding characters (-[O]) work in python also?

CodePudding user response:

You can use PyPi regex module that supports .NET-like character class subtraction and use

from regex import search, V1

var = "MY01C0DE"
regex = "^[-A-Z0-9-[O]]{1,8}$"
print(search(regex, var, V1))
# => <regex.Match object; span=(0, 8), match='MY01C0DE'>

See the Python demo.

Check the "Nested sets and set operations" section:

For example, the pattern [[a-z]--[aeiou]] is treated in the version 0 behaviour (simple sets, compatible with the re module) as:

  • Set containing “[” and the letters “a” to “z”
  • Literal “–”
  • Set containing letters “a”, “e”, “i”, “o”, “u”
  • Literal “]”

but in the version 1 behaviour (nested sets, enhanced behaviour) as:

  • Set which is:
    • Set containing the letters “a” to “z”
  • but excluding:
    • Set containing the letters “a”, “e”, “i”, “o”, “u”

Version 0 behaviour: only simple sets are supported.
Version 1 behaviour: nested sets and set operations are supported.

  • Related