In my database, there is this column called blog_content that has html structures of blog posts. In those html structures, there are 2 types of links with the following structure:
<a href="https://service-domain/en/articles/how-to-do-smth">Link Title</a>
<a href="https://service-domain/en/articles/learn-to-do#h_4980b42263" target="_self">Link Title 2</a>
Note --- these anchor tags aren't the only HTML that's on 1 row. 1 Row could have multiple anchor tags. Like this:
<h3 data-post-processed="true" id="h_e1f9f37659">Overview</h3>
<ol>
<li>
<p ><a href="https://service-domain/en/articles/how-to-do-smth">Link Title</a></p>
</li>
<li>
<p ><a href="https://service-domain/en/articles/learn-to-do#h_4980b42263" target="_self">Link Title 2</a></p>
</li>
<li>
<p ><a href="https://service-domain/en/articles/learn-to-do#h_4991b33215" target="_self">Link Title 3</a></p>
</li>
</ol>
I am trying to replace the href attribute of all tags that include "#h_" in their own href.
For example, the row above should look like this:
<h3 data-post-processed="true" id="h_e1f9f37659">Overview</h3>
<ol>
<li>
<p ><a href="https://service-domain/en/articles/how-to-do-smth">Link Title</a></p>
</li>
<li>
<p ><a href="#h_4980b42263" target="_self">Link Title 2</a></p>
</li>
<li>
<p ><a href="#h_4991b33215" target="_self">Link Title 3</a></p>
</li>
</ol>
I'm not sure how to approach this because I've found 2-3 threads relatively similar to mine.
Is this something that's even achievable in SQL?
CodePudding user response:
You can use regexp_replace
:
select regexp_replace(blog_content, 'href="[^#"] #h_(. )"', 'href="#h_$1"') from blog_posts