Home > database >  Extract html and wordpress shortcodes into array of objects
Extract html and wordpress shortcodes into array of objects

Time:10-21

I am working a headless Wordpress site with Nuxt as my front end.

The site has thousands of articles which have shortcodes. I am getting all the page data via graphql and I render content using v-html and it is all working great, but the shortcodes obviously only render as plain text.

They are mostly very simple shortcodes so I am going to create Vue components to replace them using

<component :is="someshortcode">

What I need to do is split my html up into an array of objects I can use to render the parts of the page as html or a component depending what it is.

I imagine the best way to do this is with regex which is where I am stumped.

Let's say I have the following html with some shortcodes

<h1>Lorem ipsum dolor sit amet</h1>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus facilisis elit ante. Vivamus semper dui eget justo viverra facilisis. Etiam ut leo fermentum, sagittis mauris nec, placerat lorem.</p>
<h2>Lorem ipsum dolor sit amet</h2>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus facilisis elit ante. Vivamus semper dui eget justo viverra facilisis. Etiam ut leo fermentum, sagittis mauris nec, placerat lorem.</p>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus facilisis elit ante. Vivamus semper dui eget justo viverra facilisis. Etiam ut leo fermentum, sagittis mauris nec, placerat lorem.</p>

[someshortcode attr1="value1" attr2="value2"]

<h2>Lorem ipsum dolor sit amet</h2>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus facilisis elit ante. Vivamus semper dui eget justo viverra facilisis. Etiam ut leo fermentum, sagittis mauris nec, placerat lorem.</p>
<h2>Lorem ipsum dolor sit amet</h2>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus facilisis elit ante. Vivamus semper dui eget justo viverra facilisis. Etiam ut leo fermentum, sagittis mauris nec, placerat lorem.</p>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus facilisis elit ante. Vivamus semper dui eget justo viverra facilisis. Etiam ut leo fermentum, sagittis mauris nec, placerat lorem.</p>

[someshortcode attr1="value1" attr2="value2"]

<h2>Lorem ipsum dolor sit amet</h2>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus facilisis elit ante. Vivamus semper dui eget justo viverra facilisis. Etiam ut leo fermentum, sagittis mauris nec, placerat lorem.</p>
<h2>Lorem ipsum dolor sit amet</h2>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus facilisis elit ante. Vivamus semper dui eget justo viverra facilisis. Etiam ut leo fermentum, sagittis mauris nec, placerat lorem.</p>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus facilisis elit ante. Vivamus semper dui eget justo viverra facilisis. Etiam ut leo fermentum, sagittis mauris nec, placerat lorem.</p>

What I want to do is return an array of objects as follows

[
    {
        type: 'html',
        content: `<h1>Lorem ipsum dolor sit amet</h1>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus facilisis elit ante. Vivamus semper dui eget justo viverra facilisis. Etiam ut leo fermentum, sagittis mauris nec, placerat lorem.</p>
<h2>Lorem ipsum dolor sit amet</h2>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus facilisis elit ante. Vivamus semper dui eget justo viverra facilisis. Etiam ut leo fermentum, sagittis mauris nec, placerat lorem.</p>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus facilisis elit ante. Vivamus semper dui eget justo viverra facilisis. Etiam ut leo fermentum, sagittis mauris nec, placerat lorem.</p>`
    },
    {
        type: 'shortcode',
        content: `[someshortcode attr1="value1" attr2="value2"]`
    },
    {
        type: 'html',
        content: `<h1>Lorem ipsum dolor sit amet</h1>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus facilisis elit ante. Vivamus semper dui eget justo viverra facilisis. Etiam ut leo fermentum, sagittis mauris nec, placerat lorem.</p>
<h2>Lorem ipsum dolor sit amet</h2>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus facilisis elit ante. Vivamus semper dui eget justo viverra facilisis. Etiam ut leo fermentum, sagittis mauris nec, placerat lorem.</p>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus facilisis elit ante. Vivamus semper dui eget justo viverra facilisis. Etiam ut leo fermentum, sagittis mauris nec, placerat lorem.</p>`
    },
    {
        type: 'shortcode',
        content: `[someshortcode attr1="value1" attr2="value2"]`
    },
    {
        type: 'html',
        content: `<h1>Lorem ipsum dolor sit amet</h1>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus facilisis elit ante. Vivamus semper dui eget justo viverra facilisis. Etiam ut leo fermentum, sagittis mauris nec, placerat lorem.</p>
<h2>Lorem ipsum dolor sit amet</h2>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus facilisis elit ante. Vivamus semper dui eget justo viverra facilisis. Etiam ut leo fermentum, sagittis mauris nec, placerat lorem.</p>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus facilisis elit ante. Vivamus semper dui eget justo viverra facilisis. Etiam ut leo fermentum, sagittis mauris nec, placerat lorem.</p>`
    },
]

example of content received from the Rest API

This is the basis of what I need, I will then be able to further break the shortcodes down by getting the properties etc.

What would be the best way of going about this? Is regex the best way?

CodePudding user response:

You could use a DOM Parser and iterate the toplevel elements of the DOM. If such an element is a text node and it has the shortcode format, then create a separate object for it in the output array, otherwise get the HTML for the iterated element and accumulate it while it is not a shortcode, and finally output it as an object:

const html = `<h1>Lorem ipsum dolor sit amet</h1>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus facilisis elit ante. Vivamus semper dui eget justo viverra facilisis. Etiam ut leo fermentum, sagittis mauris nec, placerat lorem.</p>

[someshortcode attr1="value1" attr2="value2"]

<h2>Lorem ipsum dolor sit amet</h2>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus facilisis elit ante. Vivamus semper dui eget justo viverra facilisis. Etiam ut leo fermentum, sagittis mauris nec, placerat lorem.</p>

[someshortcode attr1="value1" attr2="value2"]

<h2>Lorem ipsum dolor sit amet</h2>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus facilisis elit ante. Vivamus semper dui eget justo viverra facilisis. Etiam ut leo fermentum, sagittis mauris nec, placerat lorem.</p>`;

const {body} = new DOMParser().parseFromString(html, 'text/html');
let content = "";
const arr = [];
for (const child of [...body.childNodes]) {
    if (child.nodeType === 3 && child.textContent.trim()[0] == "[") {
        if (content) arr.push({ type: "html", content });
        content = "";
        arr.push({ type: "shortcode", content: child.textContent.trim() });
    } else {
        content  = (child.outerHTML ?? child.textContent);
    }
}
if (content) arr.push({ type: "html", content });
console.log(arr);

  • Related