Home > Software engineering >  Regex match everything between open and close tag
Regex match everything between open and close tag

Time:06-14

I have a string like that

<Directory />
    Options FollowSymLinks
    AllowOverride None
    Require all denied
    Order deny,allow
    Deny from all
</Directory>
<Directory /usr/share>
    AllowOverride None
    Require all granted
</Directory>
<Directory /var/www/>
    Options Indexes FollowSymLinks
    AllowOverride None
    Require all granted
</Directory>

I want to extract everything between <Directory ...> and </Directory>.

Like this

[
   ["Options FollowSymLinks", "AllowOverride None", "Require all denied", 
    "Order deny,allow","Deny from all"],
   ["AllowOverride None", "Require all granted"],
   ["Options Indexes FollowSymLinks", "AllowOverride None", "Require all granted"]
]

What's regex i can use in python3? Thank you.

CodePudding user response:

https://stackoverflow.com/a/1732454/339482

tl;dr: don't use regex to parse context-free languages, use a real parser instead. Usually there is something out there already if it's a popular format such as Apache config files. A quick search on PyPI suggests you can use apacheconfig:

CodePudding user response:

You could use (?s)<Directory />\n(.*?)\n</Directory> for extracting whatever is within a Directory pair of tags. [Demo]

You should then go and and every one of those matches to a list.

Explanation:

  • (?s) is the multiline modifier; it will make . to match newlines as well.
  • (.*?) will match everything (including newlines) between <Directory />\n and the next \n</Directory>.
  • Related