I have a string like that
<Directory />
Options FollowSymLinks
AllowOverride None
Require all denied
Order deny,allow
Deny from all
</Directory>
<Directory /usr/share>
AllowOverride None
Require all granted
</Directory>
<Directory /var/www/>
Options Indexes FollowSymLinks
AllowOverride None
Require all granted
</Directory>
I want to extract everything between <Directory ...> and </Directory>.
Like this
[
["Options FollowSymLinks", "AllowOverride None", "Require all denied",
"Order deny,allow","Deny from all"],
["AllowOverride None", "Require all granted"],
["Options Indexes FollowSymLinks", "AllowOverride None", "Require all granted"]
]
What's regex i can use in python3? Thank you.
CodePudding user response:
https://stackoverflow.com/a/1732454/339482
tl;dr: don't use regex to parse context-free languages, use a real parser instead. Usually there is something out there already if it's a popular format such as Apache config files. A quick search on PyPI suggests you can use apacheconfig
:
CodePudding user response:
You could use (?s)<Directory />\n(.*?)\n</Directory>
for extracting whatever is within a Directory
pair of tags. [Demo]
You should then go and and every one of those matches to a list.
Explanation:
(?s)
is the multiline modifier; it will make.
to match newlines as well.(.*?)
will match everything (including newlines) between<Directory />\n
and the next\n</Directory>
.