Home > Net >  importing parts of html file using xslt
importing parts of html file using xslt

Time:08-09

Let's say I have three files and I want to include/import their content one after another, and construct a single html file.

header.html

<header>
Hello, this is a header of the web page.
</header>

main.html

<main>
Hello, this is a main part of the web page.
</main>

footer.html

<footer>
Hello, this is a footer part of the web page.
</footer>

The expected output:

<header>
Hello, this is a header of the web page.
</header>
<main>
Hello, this is a main part of the web page.
</main>
<footer>
Hello, this is a footer part of the web page.
</footer>

Is it possible to achieve this in a simple and readable way using xslt?
I have seen lots of xml merge examples and they never satisfied and looked overly complex.
All I want is join the html/xml files into a single file.

CodePudding user response:

You don't mention what version of XSLT you're running, so I'm assuming just version 1.0.

It can be as easy as:

<html xsl:version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <head>
    <title>importing parts of html file using xslt</title>
  </head>
  <body>
    <xsl:copy-of select="document('header.html')"/>
    <xsl:copy-of select="document('main.html')"/>
    <xsl:copy-of select="document('footer.html')"/>
  </body>
</html>

Note that the stylesheet does nothing with its input document; all it does is explicitly include the three named files into a single template. However, you do have to apply this stylesheet to some input document, and since it doesn't really matter which, you can apply the stylesheet to itself.

NB this uses the "simplified stylesheet" syntax in which your stylesheet is a literal result element (html in this case) which is treated as if it were the child of an xsl:template that matches /.

If I were doing this with an up-to-date XSLT interpreter (i.e. for XSLT version 3.0) I would probably write a stylesheet with a single template named xsl:initial-template. This would mean you wouldn't have to supply an input document at all.

e.g.

<xsl:stylesheet xsl:version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template name="xsl:initial-template">
        <html>
            <head>
                <title>importing parts of html file using xslt</title>
            </head>
            <body>
                <xsl:copy-of select="document('header.html')"/>
                <xsl:copy-of select="document('main.html')"/>
                <xsl:copy-of select="document('footer.html')"/>
            </body>
        </html>
    </xsl:template>
</xsl:stylesheet>

CodePudding user response:

You won't be able to parse/process proper HTML using an XML tool chain since HTML isn't XML but uses features from SGML (= original markup language on which XML is based) that weren't included in the XML subset, such as tag inference and other shortforms. There's a tutorial I wrote to accomplish almost exactly what you want using SGML available at sgmljs.net - Producing HTML, though.

That said, XML does have entities, so the basic technique of including text fragments via entities to produce XHTML (= XML-conformant serialization of HTML markup) works with any XML parser that does DTD processing, which all of the well-known mature ones do:

<!DOCTYPE html [
  <!ENTITY header SYSTEM "header.html">
  <!ENTITY main SYSTEM "main.html">
  <!ENTITY footer SYSTEM "main.html">
]]>
<html>
  &header;
  &main;
  &footer;
</html>

where &header;, &main;, and &footer; are replaced by the respective content of the header.html, main.html, and footer.html files without the need of XSLT or other programming language.

To render the document (given the name doc.xhtml here) with expanded text for header, main, and footer, you can use eg. xmllint (part of libxml2):

xmllint -noent doc.xhtml

Be aware, though, that the document produced that way isn't commonly accepted as HTML, and isn't even XHTML proper since it lacks a head, body, and title element (as also observed by @Conal Tuohy), so the minimal proper XHTML document looks like this:

<!DOCTYPE html [
  <!ENTITY header SYSTEM "header.html">
  <!ENTITY main SYSTEM "main.html">
  <!ENTITY footer SYSTEM "footer.html">
]]>
<html>
  <head>
    <title>Importing parts of XHTML using entities</title>
  </head>
  <body>
    &header;
    &main;
    &footer;
  </body>
</html>
  • Related