So here is my problem
Sample input (file.txt)
all chars possible
[[example.org]][1500]
all chars possible
[[example.org]][1318]
all chars possible
[1318]: https://web.example.org/web/https://example.com
[1500]: https://web.example.org/web/https://example.net
all chars possible
Sample desired output (file.txt)
all chars possible
[[example.org]](https://web.example.org/web/https://example.net)
all chars possible
[[example.org]](https://web.example.org/web/https://example.com)
all chars possible
(basically again every special char)
For the first match, brackets are constant, number is variable and can be 1-4 numbers in length. [[example.org]][variable number]
For the second match brackets are constant, semi colon as well, number is variable 1-4 in length and matching the previous variable number (reference in markdown),followed by a space and then any URL possible, http with any URL char possible BUT always starts with https://web.example.org/web/)
So I want to replace things as follows
[[example.org]][1500]
By (for every matching number without being hampered by the ton of special chars in the file)
[[example.org]](URL)
In this case
[[example.org]](https://web.exampe.org/web/https://example.net)
And then remove the [1500]: https://web.example.org/web/https://example.net
after modification
Ending with:
(tons of special chars/text/url)
[[example.org]](https://web.example.org/web/https://example.net)
(tons of special chars/text/url)
The first regex I made is:
\[\[example.org\]\]\[([0-9]{1,4})\]
Capture group 1 getting [1500]
The second regex I made is:
[1500\]: (.*)
Capture group getting the URL in a dirty way
Yet it should be using the capture group 1 from above so ended with this
\1: (.*)
And I want to end up with only the first matched regex as follows instead of a numbered reference. I'm stuck here.
[[example.org]](captured URL)
Followed by removal of the entire line (no space left)
[1500]: URL
I tried some JS but got stuck by all the special chars in the document breaking things. Any help would be greatly appreciated. I don't know if/how this is possible with a simple search/replace using regexes. And I'm sure there must be an easy way to do this maybe using sed. This is for help for our open-source project and I admit I'm not the best at this.
CodePudding user response:
Using GNU sed
$ sed -Ez ':a;s/(\[\[[^[]*)(\[[^\n]*)(.*)\2: ([^\n]*)(\n.*$)?/\1(\4)\3/;ta' file.txt
all chars possible
[[example.org]](https://web.example.org/web/https://example.net)
all chars possible
[[example.org]](https://web.example.org/web/https://example.com)
all chars possible
CodePudding user response:
An awk
solution will be a better fit here:
awk -F ': ' '
FNR==NR {
if (NF==2)
map[$1] = $2
next
}
$2 in map {
$2 = "]](" map[$2] ")"
}
!($1 in map)
' file FS=']]|: ' file
all chars possible
[[example.org ]](https://web.example.org/web/https://example.net)
all chars possible
[[example.org ]](https://web.example.org/web/https://example.com)
all chars possible