Home > front end >  SQL Replacing HREF of HTML Structure if href contains substring
SQL Replacing HREF of HTML Structure if href contains substring

Time:07-12

In my database, there is this column called blog_content that has html structures of blog posts. In those html structures, there are 2 types of links with the following structure:

  • <a href="https://service-domain/en/articles/how-to-do-smth">Link Title</a>
  • <a href="https://service-domain/en/articles/learn-to-do#h_4980b42263" target="_self">Link Title 2</a>

Note --- these anchor tags aren't the only HTML that's on 1 row. 1 Row could have multiple anchor tags. Like this:

<h3 data-post-processed="true" id="h_e1f9f37659">Overview</h3>
<ol>
    <li>
        <p ><a  href="https://service-domain/en/articles/how-to-do-smth">Link Title</a></p>
    </li>
    <li>
        <p ><a  href="https://service-domain/en/articles/learn-to-do#h_4980b42263" target="_self">Link Title 2</a></p>
    </li>
            <li>
        <p ><a  href="https://service-domain/en/articles/learn-to-do#h_4991b33215" target="_self">Link Title 3</a></p>
    </li>
</ol>

I am trying to replace the href attribute of all tags that include "#h_" in their own href.

For example, the row above should look like this:

<h3 data-post-processed="true" id="h_e1f9f37659">Overview</h3>
<ol>
    <li>
        <p ><a  href="https://service-domain/en/articles/how-to-do-smth">Link Title</a></p>
    </li>
    <li>
        <p ><a  href="#h_4980b42263" target="_self">Link Title 2</a></p>
    </li>
            <li>
        <p ><a  href="#h_4991b33215" target="_self">Link Title 3</a></p>
    </li>
</ol>

I'm not sure how to approach this because I've found 2-3 threads relatively similar to mine.

Is this something that's even achievable in SQL?

CodePudding user response:

You can use regexp_replace:

select regexp_replace(blog_content, 'href="[^#"] #h_(. )"', 'href="#h_$1"') from blog_posts

See fiddle.

  • Related