Regex: Remove empty sections from INI (Text) files where the files could contain multiple such secti-CodePudding

I have below resultant php.ini file, after I was able to remove the comments & spaces thro' some simple Regex find/replace steps:

[PHP]
engine = On
short_open_tag = Off
....
....
[CLI Server]
cli_server.color = On
[Date]
[filter]
[iconv]
[imap]
[intl]
[sqlite3]
[Pcre]
[Pdo]
[Pdo_mysql]
pdo_mysql.default_socket=
[Phar]
[mail function]
SMTP = localhost
smtp_port = 25
mail.add_x_header = Off
[ODBC]
....
....
[dba]
[opcache]
[curl]
curl.cainfo = "{PHP_Dir_Path}\extras\curl\cacert.pem"
[openssl]
[ffi]

As you can see, there are multiple occurrences where multiple empty sections(sections which doesn't contain any semicolon-less(non-commented) lines) in this file, and I can't bring myself to make a regex find/replace pattern that could let me remove all such empty sections in one go, so that it becomes like below:

[PHP]
engine = On
short_open_tag = Off
....
....
[CLI Server]
cli_server.color = On
[Pdo_mysql]
pdo_mysql.default_socket=
[mail function]
SMTP = localhost
smtp_port = 25
mail.add_x_header = Off
[ODBC]
....
....
[curl]
curl.cainfo = "{PHP_Dir_Path}\extras\curl\cacert.pem"

Can anyone help me out achieve, what I need ?

CodePudding user response：

An idea to look ahead after lines starting with [ for another opening bracket (or end).

^\[.* \s* (?![^\[])

Here is the demo at regex101 - If using NP uncheck: [ ] . dot matches newline

^ line start (NP default)
\[ matches an opening bracket
.* any amount of any characters besides newline (without giving back)
\s* any amount of whitespace (also possessive to reduce backtracking)
(?! negative lookahead to fail on the defined condition ) which is:
[^\[] a character that is not an opening bracket

In short words it matches lines starting with [ up to eol and any amount of whitespace...
if there is either no character ahead or the next character is another [ opening bracket.
Side note: Its positive equivalent is ^\[.* \s* (?=\[|\z) where \z matches end of string.

CodePudding user response：

You can try to match if there is a ] followed by a new line and then a [, with the following regex:

\]\n\[

EDIT:

As pointed by your comment, that would just get the ][ characters, so you could try this instead:

(\[(\w) \]\n)(?!\w)

This will match a title that is not followed by a word in the next line.

EDIT2:

My previous answer would not get the last section if it was empty, so I changed it to check the newline OR end of file.

(\[(\w) \])(\n(?!\w)|$)

CodePudding user response：

You need to tell your regex-engine to use the single-line aka "dotall" mode. Then you can easily pick out any bracketed strings that are only separated by a newline:

 /\[[^\]] \]\s\[[^\]] \]/gs

The s flag enables "dotall" mode.

Update: Overlooked one obvious problem with my solution. It gets a bit more complicated now, using a lookahead (?:\s(?=\[)). Also extra caution needs to be taken to capture the last empty section, which is done with the |$ part. Regexr link updated...

 /\[[^\]] \](?:\s(?=\[)|$)/gs