The following fragment of code comes from my github repository found here. It opens a binary file, and extracts the text within <header> tags. These are the crucial lines:
gbxfile = open(filename,'rb')
gbx_data = gbxfile.read()
gbx_header = b'(<header)((?s).*)(</header>)'
header_intermediate = re.findall(gbx_header, gbx_data)
The script works BUT it receives the following Deprecation Warning:
DeprecationWarning: Flags not at the start of the expression b'(<header)((?s).*)(</' (truncated)
header_intermediate = re.findall(gbx_header, gbx_data)
What is the correct use of the regular expression in gbx_header
, so that this warning is not displayed?
CodePudding user response:
You can check the Python bug tacker Issue 39394, the warning was introduced in Python 3.6.
The point is that the Python re
now does not allow using inline modifiers not at the start of string. In Python 2.x, you can use your pattern without any problem and warnings as (?s)
is silently applied to the whole regular expression under the hood. Since it is not always an expected behavior, the Python developers decided to produce a warning.
Note you can use inline modifier groups in Python re
now, see restrict 1 word as case sensitive and other as case insensitive in python regex | (pipe).
So, the solutions are
- Putting
(?s)
(or any other inline modifier) at the start of the pattern:(?s)(<header)(.*)(</header>)
- Using the
re
option,re.S
/re.DOTALL
instead of(?s)
,re.I
/re.IGNORECASE
instead of(?i)
, etc. - Using workarounds (instead of
.
, use[\w\W]
/[\d\D]
/[\s\S]
if you do not want to use(?s)
orre.S
/re.DOTALL
).