I'm trying to match type annotations like int | str
, and use regex substitution to replace them with a string Union[int, str]
.
Desired substitutions (before and after):
str|int|bool
->Union[str,int,bool]
Optional[int|tuple[str|int]]
->Optional[Union[int,tuple[Union[str,int]]]]
dict[str | int, list[B | C | Optional[D]]]
->dict[Union[str,int], list[Union[B,C,Optional[D]]]]
The regular expression I've come up with so far is as follows:
r"\w*(?:\[|,|^)[\t ]*((?'type'[a-zA-Z0-9_.\[\]] )(?:[\t ]*\|[\t ]*(?&type)) )(?:\]|,|$)"
You can try it out here on Regex Demo. It's not really working how I'd want it to. The problems I've noted so far:
It doesn't seem to handle nested Union conditions so far. For example,
int | tuple[str|int] | bool
seems to result in one match, rather than two matches (including the inner Union condition).The regex seems to consume unnecessary
]
at the end.Probably the most important one, but I noticed the regex subroutines don't seem to be supported by the
re
module in Python. Here is where I got the idea to use that from.
Additional Info
This is mainly to support the PEP 604 syntax for Python 3.7 , which requires annotatations to be forward-declared (e.g. declared as strings) to be supported, as otherwise builtin types don't support the |
operator.
Here's a sample code that I came up with:
from __future__ import annotations
import datetime
from decimal import Decimal
from typing import Optional
class A:
field_1: str|int|bool
field_2: int | tuple[str|int] | bool
field_3: Decimal|datetime.date|str
field_4: str|Optional[int]
field_5: Optional[int|str]
field_6: dict[str | int, list[B | C | Optional[D]]]
class B: ...
class C: ...
class D: ...
For Python versions earlier than 3.10, I use a __future__
import to avoid the error below:
TypeError: unsupported operand type(s) for |: 'type' and 'type'
This essentially converts all annotations to strings, as below:
>>> A.__annotations__
{'field_1': 'str | int | bool', 'field_2': 'int | tuple[str | int] | bool', 'field_3': 'Decimal | datetime.date | str', 'field_4': 'str | Optional[int]', 'field_5': 'Optional[int | str]', 'field_6': 'dict[str | int, list[B | C | Optional[D]]]'}
But in code (say in another module), I want to evaluate the annotations in A. This works in Python 3.10, but fails in Python 3.7 even though the __future__
import supports forward declared annotations.
>>> from typing import get_type_hints
>>> hints = get_type_hints(A)
Traceback (most recent call last):
eval(self.__forward_code__, globalns, localns),
File "<string>", line 1, in <module>
TypeError: unsupported operand type(s) for |: 'type' and 'type'
It seems the best approach to make this work, is to replace all occurrences of int | str
(for example) with Union[int, str]
, and then with typing.Union
included in the additional localns
used to evaluate the annotations, it should then be possible to evaluate PEP 604- style annotations for Python 3.7 .
CodePudding user response:
You can install the PyPi regex
module (as re
does not support recursion) and use
import regex
text = "str|int|bool\nOptional[int|tuple[str|int]]\ndict[str | int, list[B | C | Optional[D]]]"
rx = r"(\w \[)(\w (\[(?:[^][|] |(?3))*])?(?:\s*\|\s*\w (\[(?:[^][|] |(?4))*])?) )]"
n = 1
res = text
while n != 0:
res, n = regex.subn(rx, lambda x: "{}Union[{}]]".format(x.group(1), regex.sub(r'\s*\|\s*', ',', x.group(2))), res)
print( regex.sub(r'\w (?:\s*\|\s*\w ) ', lambda z: "Union[{}]".format(regex.sub(r'\s*\|\s*', ',', z.group())), res) )
Output:
Union[str,int,bool]
Optional[Union[int,tuple[Union[str,int]]]]
dict[Union[str,int], list[Union[B,C,Optional[D]]]]
See the Python demo.
The first regex finds all kinds of WORD[...]
that contain pipe chars and other WORD
s or WORD[...]
with no pipe chars inside them.
The \w (?:\s*\|\s*\w )
regex matches 2 or more words that are separated with pipes and optional spaces.
The first pattern details:
(\w \[)
- Group 1 (this will be kept as is at the beginning of the replacement): one or more word chars and then a[
char(\w (\[(?:[^][|] |(?3))*])?(?:\s*\|\s*\w (\[(?:[^][|] |(?4))*])?) )
- Group 2 (it will be put insideUnion[...]
with all\s*\|\s*
pattern replaced with,
):\w
- one or more word chars(\[(?:[^][|] |(?3))*])?
- an optional Group 3 that matches a[
char, followed with zero or more occurrences of one or more[
or]
chars or whole Group 3 recursed (hence, it matches nested parentheses) and then a]
char(?:\s*\|\s*\w (\[(?:[^][|] |(?4))*])?)
- one or more occurrences (so the match contains at least one pipe char to replace with,
) of:\s*\|\s*
- a pipe char enclosed with zero or more whitespaces\w
- one or more word chars(\[(?:[^][|] |(?4))*])?
- an optional Group 4 (matches the same thing as Group 3, note the(?4)
subroutine repeats Group 4 pattern)
]
- a]
char.