Home > Software engineering >  How to match the beginning and end of a plain text table ( --- --- ) with regex?
How to match the beginning and end of a plain text table ( --- --- ) with regex?

Time:03-21

Plain text tables as exported by pandoc look like this:

 ------ ------- 
| x    | y     |
 ====== ======= 
| 1    | 4     |
| 2    | 5     |
| 3    | 6     |
 ------ ------- 

goal: find such tables and surround them to with "```" in the line before and after the table

```
 ------ ------- 
| x    | y     |
 ====== ======= 
| 1    | 4     |
| 2    | 5     |
| 3    | 6     |
 ------ ------- 
```

(I escaped the "`" otherwise it marks code block boundaries and ends the just started code block)

I can find the horizontal cell dividers with regex ^\ .*\ $.

...but I suppose I need to find the top and bottom one with a look-aheads and look-behinds to check no further of | which mark borders of the table exist in the next or previous line. But, I can't figure out how. Someone got some ideas?

CodePudding user response:

You can try matching:

\ - \ - \ [\s\S] ?\ - \ - \ 

then replace with:

`$&`

Explanation - all those plus signs :-)

\ - match a literal

- - match a hyphen - one or more times

\ - match a literal

- - match a hyphen - one or more times

\ - match a literal

[\s\S] ? match any character one or more times (NON greedy = as few as possible) - this includes newlines

\ - match a literal

- - match a hyphen - one or more times

\ - match a literal

- - match a hyphen - one or more times

\ - match a literal

Now replace with the whole match surrounded with back-ticks

Set the global flag to replace all.

CodePudding user response:

You haven't said what programming language you are working with.
The fillowing example works in Python.
Regex Pattern, using flag s include newline in . .
As the header line begins with the beginning of a line ^ and the footer line ends with a line end $ we shouldn't have any problems.
I have forbidden the string - - to avoid capturing 2 tables at a time, with everything in between, if we have 2 tables on the same page.

"^(\ (- \ ) ((?!(- -)).)*\ (- \ ) )$"gms

Substituting with

```\n\1\n```

See https://regex101.com/r/PslCIX/2

  • Related