Home > Back-end >  RegEx to format Wikipedia's infoboxes code [SOLVED]
RegEx to format Wikipedia's infoboxes code [SOLVED]

Time:11-18

I am a contributor to Wikipedia and I would like to make a script with AutoHotKey that could format the wikicode of infoboxes and other similar templates.

Infoboxes are templates that displays a box on the side of articles and shows the values of the parameters entered (they are numerous and they differ in number, lenght and type of characters used depending on the infobox).

Parameters are always preceded by a pipe (|) and end with an equal sign (=). On rare occasions, multiple parameters can be put on the same line, but I can sort this manually before running the script.

A typical infobox will be like this:

{{Infobox XYZ
 | first parameter  = foo
 | second_parameter = 
 | 3rd parameter    = bar
 | 4th              = bazzzzz
 | 5th              = 
 | etc.             = 
}}

But sometime, (lazy) contributors put them like this:

{{Infobox XYZ
|first parameter=foo
|second_parameter= 
|3rd parameter=bar
|4th=bazzzzz
|5th= 
|etc.= 
}}

Which isn't very easy to read and modify.

I would like to know if it is possible to make a regex (or a serie of regexes) that would transform the second example into the first.

The lines should start with a space, then a pipe, then another space, then the parameter name, then any number of spaces (to match the other lines lenght), then an equal sign, then another space, and if present, the parameter value.

I try some things using multiple capturing groups, but I'm going nowhere... (I'm even ashamed to show my tries as they really don't work).

Would someone have an idea on how to make it work?

Thank you for your time.

CodePudding user response:

The lines should start with a space, then a pipe, then another space, then the parameter name, then a space, then an equal sign, then another space, and if present, the parameter value.

First the selection, it's relatively trivial:

^\s*\|\s*([^=]*?)\s*=(.*)$

Then the replacement, literally your description of what you want (note the space at the beginning):

 | $1 = $2

See it in action here.

CodePudding user response:

@Blindy:

The best code I have found so far is the following : https://regex101.com/r/GunrUg/1

The problem is it doesn't align the equal signs vertically...

CodePudding user response:

I got an answer on AutoHotKey forums:

^i::
out := ""
Send, ^x
regex := "O)\s*\|\s*(.*?)\s*=\s*(.*)", width := 1
Loop, Parse, Clipboard, `n, `r
    If RegExMatch(A_LoopField, regex, _)
        width := Max(width, StrLen(_[1]))
Loop, Parse, Clipboard, `n, `r
    If RegExMatch(A_LoopField, regex, _)
        out .= Format(" | {:-" width "} = {2}", _[1],_[2]) "`n"
else
    out .= A_LoopField "`n"
Clipboard := out
Send, ^v
Return

With this script, pressing Ctrl i formats the infobox code just right (I guess a simple regex isn't enough to do the job).

  • Related