I have a set of markdown format posts for a jekyll site that each contain a markdown link. For example:
---
layout: post
title: "The Title"
date: 2022-07-31
categories:
- CategoryX
- CategoryY
author: AuthorName, SecondAuthor
tags: [tag1,tag2,tag3]
---
Some text that might contain (brackets] or other symbols.
[Visit Link](https://www.linkhere.net/somepage){:target="_blank" rel="noopener"}
I'd like to extract just the full URLs from each file in the _post directory and write them to a new file.
This is the code and commented attempts
#!/bin/bash
# configuration
jekyll_post_dir="<jekyll_dir>/_posts"
for file in $jekyll_post_dir/*
do
#link=$(sed -n -e '/[Visit Link]/,/{:target/p' $file)
#link=$(sed -n '/[Visit Link]/,/target/{ /html>/d; p }' $file)
#link=$(awk '/[Visit Link]/,/target/' $file)
#link=$(sed -n 's/[^{]*\({[^}]*}\).*/\1/g' $file)
#link=$(sed 's/.*Link](\(.*\))/\1/' $file)
#link=$(awk -F"[()]" '{print $2}' $file )
#while IFS="](){" read a b; do echo "$b"; done < $file
#link=$(sed -n '/\](/,/)\{:/p' $file)
#echo $link >> linklist.txt
done
All my attempts have either selected unwanted text or failed completely. I am not familiar with regex or similar definitions so I would appreciate some guidance. I'm happy to use any bash-supported solution.
Thanks for reading/helping...
CodePudding user response:
The command below gets the expected URL
sed -nre '/:target=/ s/.*[]][(]([^)] )[)][{]:target=.*/\1/p' test.txt
Result
https://www.linkhere.net/somepage
Alternative command
sed -nre '/:target=/ s/.*\]\(([^)] )\)\{:target=.*/\1/p' test.txt