Home > Blockchain >  Specify Pandoc HTML numbering to start from <h2>
Specify Pandoc HTML numbering to start from <h2>

Time:08-20

I want to convert a markdown to HTML with header numbering, starting from <h2>. What's the way to achieve it?

pandoc provides the option --number-sections (or -N) so headers are numbered in the output. Now I am trying to convert markdown to HTML with this option.

In default, the output HTML header level of pandoc starts from <h1>. It is not ideal and so I want to change it to <h2> (whereas the original markdown may contain many first-level headers, the output HTML should contain at most 1 <h1>).

It is possible to specify --shift-heading-level-by=1; then, the output header level starts from <h2> (see Official Pandoc User's Guide and maybe also this question). However, it would mess up the section-numbering! Basically, the level of the section numbering shifts, too. Now all sections are under "0" (like 0.1, 0.2, 0.2.1, …) and no sections of 1 exist.

pandoc provides another option --number-offset=1 but what it does is just offseting the numbers like "0.1"→"1.1". Then, all section numbers start from 1 with no sections numbered 2. Obviously, it makes no sense. The initial prefix number "1." is redundant and should be removed from all the section numbers like 1.1→1, 1.1.4→1.4, 1.2.3→2.3, etc.

For demonstration purposes, here is a sample markdown text file (abc.md)

%Test-md

# First Header (1) #

## Header (1-1) ##

# Second Header (2) #

## Header (2-2) ##

### Header (2-3) ###

and its output HTML (simplified) with

pandoc -N --section-divs --shift-heading-level-by=1 -t html5 abc.md
<section id="first-header-1" data-number="0.1">
  <h2 data-number="0.1">0.1 First Header (1)</h2>
    <section id="header-1-1" data-number="0.1.1">
      <h3 data-number="0.1.1">0.1.1 Header (1-1)</h3>
    </section>
  </section>
  <section id="second-header-2" data-number="0.2">
    <h2 data-number="0.2">0.2 Second Header (2)</h2>
      <section id="header-2-2" data-number="0.2.1">
        <h3 data-number="0.2.1">0.2.1 Header (2-2)</h3>
        <section id="header-2-3" data-number="0.2.1.1">
          <h4 data-number="0.2.1.1">0.2.1.1 Header (2-3)</h4>
       </section>
  </section>
</section>

How can one make pandoc do the numbering in the ordinary way (1, 2, 2.1, 2.2, 2.2.1) yet output the HTML with the header level starting from <h2>?

CodePudding user response:

So far, the easiest solution I have found is to make it in two steps. First, convert a markdown to HTML with no shift in the header levels. Then, convert the HTML to another HTML in which the header level is shifted by 1: <h1><h2>.

Here is an example code:

pandoc -N --section-divs -t html5 /tmp/try1.md |\
  pandoc --from=html -t html5 --shift-heading-level-by=1 > output.html

Notice --from=html in the second pandoc -- it is necessary because otherwise pandoc would not know the file type of the streaming (pipe) input.

Here is the (simplified) output. There is now no redundant common prefix like "0." or "1." in the section-header numbers.

<section id="first-header-1" data-number="1">
  <h2 data-number="1">1 First Header (1)</h2>
    <section id="header-1-1" data-number="1.1">
      <h3 data-number="1.1">1.1 Header (1-1)</h3>
    </section>
</section>
<section id="second-header-2" data-number="2">
  <h2 data-number="2">2 Second Header (2)</h2>
    <section id="header-2-2" data-number="2.1">
      <h3 data-number="2.1">2.1 Header (2-2)</h3>
        <section id="header-2-3" data-number="2.1.1">
          <h4 data-number="2.1.1">2.1.1 Header (2-3)</h4>
        </section>
    </section>
</section>

As a note, number-offset is irrelevant because it is to specify the numbering to start from a different number from the default 1 or 0 and does nothing with the section-numbering level.

CodePudding user response:

Pandoc first shifts the headings, then does the numbering. This is not what we want here though, we'd like the numbering to happen first. A pandoc Lua filters can be used to take control of this.

The function pandoc.utils.make_sections performs the action that's triggered by passing --section-divs on the command line. Matching the effect of --shift-heading-level-by=1 is possible by modifying all Header elements manually:

function Pandoc (doc)
  -- Create and number sections. Setting the first parameter to
  -- `true` ensures that headings are numbered.
  doc.blocks = pandoc.utils.make_sections(true, nil, doc.blocks)

  -- Shift the heading levels by 1
  doc.blocks = doc.blocks:walk {
    Header = function (h)
      h.level = h.level   1
      return h
    end
  }

  -- Return the modified document
  return doc
end

The filter would be used by saving it to a file shifted-numbered-headings.lua. It can then be passed to pandoc via the --lua-filter/-L parameter. The --number-sections/-N option must still be passed for the numbering to become visible.

pandoc --lua-filter=shifted-numbered-headings.lua --number-sections ...
  • Related