first question for me on Stack Overflow.
I am trying to write a Bash script to convert the kind of Github Wiki links generated for other internal Github Wiki pages into conventional Markdown-style links.
The Github Wiki link strings look like this:
[[An example of another page]]
I want to convert it to look like this:
[An example of another page](An-example-of-another-page.htm)
Documents have an unknown number of these links and I don't know the content.
Currently I have been playing around with one-line sed solutions given to other problems, like this one:
https://askubuntu.com/questions/1283471/inserting-text-to-existing-text-within-brackets
... with absolutely no success. I'm not even sure where to start with it.
Thanks.
CodePudding user response:
You can use bash's internal regular expression support to find and replace instances of wiki linked [[text]]
with [text](text.htm)
. The pattern you want to use is \[\[([^\]]*)\]\]
\[
and\]
- escapes the left and right square brackets so that they aren't interpreted as meta-characters that let you match character classes([^\]]*)
captures all text inside the double brackets until the first right square bracket
From there you can evaluate this regex and use the $BASH_REMATCH
array to check if any matches are made. You'll need to run this multiple times in order to match all instances in the string and then replace the string inline using the /
and //
operators.
Here's a sample script:
#!/usr/bin/env bash
wiki_string="Now, this is [[a story]] all about how
My life [[got flipped-turned upside down]]
And I'd [[like to take a minute]]
Just [[sit]] right there
I'll [[tell you]] how I [[became the prince]] of a town called Bel-Air"
printf 'Original: %s\n' "$wiki_string"
# find the first instance of [[text]] and capture the text inside
# the square brackets
[[ "$wiki_string" =~ \[\[([^\]]*)\]\] ]]
# if successful, BASH_REMATCH will contain the matched text and the
# captured value inside the parentheses
while [[ ${#BASH_REMATCH[@]} == 2 ]]; do
# escape the [ and ] characters so we can replace [[text]]
# with our modified value
replace_text="${BASH_REMATCH[0]}"
replace_text="${replace_text/\[\[/\\[\\[}"
replace_text="${replace_text/\]\]/\\]\\]}"
# Get the matched value inside the brackets
link_text="${BASH_REMATCH[1]}"
# store another copy of the text with the spaces replaced
# with dashes and appending .htm
link_target="${link_text// /-}.htm"
# Finally, replace the matched [[text]] with [text](text.htm)
wiki_string="${wiki_string//$replace_text/[$link_text]($link_target)}"
# Search the string again for the next instance of [[text]]
[[ "$wiki_string" =~ \[\[([^\]]*)\]\] ]]
done
printf '\nUpdated: %s\n' "$wiki_string"
CodePudding user response:
You can try this sed
$ sed -E 's/\[//;s/\]//;s/(.)(.[^]]*)(.)/\1\2\3(\2)/;s/(.[^\(]*)(\S*)\s(\S*)\s/\1\2-\3-/g' input_file
[An example of another page](An-example-of-another-page)