Home > front end >  Convert list of H1..6 to Hierarchy
Convert list of H1..6 to Hierarchy

Time:09-28

I would like to convert a list of H1 through H6 tags from a markdown file into a Javascript hierarchy for use in a Table of Contents.

Currently the list is generated by AstroJS in this format [{depth: 1, text: 'I am a H1'}, {depth: 2: 'I am a H2}].

Caveats

  • The markdown is created by end-users.
  • This list may have a single root heading (H1 -> H2 -> H3), but
  • It may have multiple root headings (H2 -> H3, H2 -> H3) or
  • It may have a non conventional list of headings (H3, H2 -> H3)
  • It may skip nesting levels (H1 -> H3 -> H6)

Looking for a Javascript or Typescript example.

The following three scenarios are based on some Markdown content that is being processed by an AstroJS website.

Single root heading

This standard SEO friendly set of headings has a single H1 followed by other headings

As markdown

# Main heading
## Sub heading 1
### More info
## Sub heading 2
### Even more info
## Sub heading 3 (edge case)
##### Deep nesting

As flat javascript array

headings = [
  { depth: 1, text: 'Main heading' },
  { depth: 2, text: 'Sub heading 1' },
  { depth: 3, text: 'More info' },
  { depth: 2, text: 'Sub heading 2' },
  { depth: 3, text: 'Even more info' },
  { depth: 2, text: 'Sub heading 3 (edge case)' },
  { depth: 6, text: 'Deep nesting' },
]

As javascript hierarchy

list_of_heading_heirachies = [
  { text: 'Main heading', headings: [
    { text: 'Sub heading 1', headings: [
      { text: 'More info', headings: [] },
    ] },
    { text: 'Sub heading 2', headings: [
      { text: 'Even more info', headings: [] },
    ] },
    { text: 'Sub heading 3 (edge case)', headings: [
      { text: 'Deep nesting', headings: [] },
    ] }
  ]}
]

console.log(list_of_heading_heirachies.length);
// => 1

Multiple root headings

This markdown (common to listicle pages) does not have a single root node like above, instead it has multiple H2s

As markdown

## Why is it done
### Why abc
### Why xyz
## How is it done
### How reason 1
### How reason 2
#### More info
## Conclusion

As flat javascript array

headings = [
  { depth: 2, 'Why is it done' },
  { depth: 3, 'Why abc' },
  { depth: 3, 'Why xyz' },
  { depth: 2, 'How is it done' },
  { depth: 3, 'How reason 1' },
  { depth: 3, 'How reason 2' },
  { depth: 4, 'More info' },
  { depth: 2, 'Conclusion' }
]

As javascript hierarchy

list_of_heading_heirachies = [
  { text: 'Why is it done', headings: [
    { text: 'Why abc', headings: [] },
    { text: 'Why xyz', headings: [] },
  ] },
  { text: 'How is it done', headings: [
    { text: 'How reason 1', headings: [] },
    { text: 'How reason 2', headings: [
      { text: 'More info', headings: [] },
    ] },
  ] },
  { text: 'Conclusion', headings: [] }
]

console.log(list_of_heading_heirachies.length);
// => 3

Non-conventional headings list

This non-conventional headings list happens when there is meta data or breadcrumb data before the general content headings

#### Home -> Blog -> Some Articles
### By Ben Hurr
#### 24th, Sep, 2022
# Some cool Article
## Why abc
### info on why
### more info on why
## How
### How we did it
## Conclusion

As flat javascript array

headings = [
  { depth: 4, text: 'Home -> Blog -> Some Articles' },
  { depth: 3, text: 'By Ben Hurr' },
  { depth: 4, text: '24th, Sep, 2022' },
  { depth: 1, text: 'Some cool Article' },
  { depth: 2, text: 'Why abc' },
  { depth: 3, text: 'info on why' },
  { depth: 3, text: 'more info on why' },
  { depth: 2, text: 'How' },
  { depth: 3, text: 'How we did it' },
  { depth: 2, text: 'Conclusion' },
]

As javascript hierarchy

list_of_heading_heirachies = [
  { text: 'Home -> Blog -> Some Articles', headings: [] },
  { text: 'By Ben Hurr', headings: [
    { text: '24th, Sep, 2022', headings: [] },
  ] },
  { text: 'Some cool Article', headings: [
    { text: 'Why abc', headings: [
      { text: 'info on why', headings: [] },
      { text: 'more info on why', headings: [] },
    ] },
    { text: 'How', headings: [
      { text: 'How we did it', headings: [] },
    ] },
    { text: 'Conclusion', headings: [] },
  ] },
]

console.log(list_of_heading_heirachies.length);
// => 3

CodePudding user response:

I was able to solve this problem in Typescript.

I am not experienced in this language and while it works fine, someone with more experience may be able to improve this code.

My goal was to take any list of H1..6 and turn it into a Hierarchy and then filter to final Hierarchy to produce a useful Table of Content.

Output Screen Shot

enter image description here

Heading.ts

export default class Heading {
  depth: number;
  sequence: number;
  text: string;
  opts: any;

  parent: Heading | null;
  headings: Heading[] | null;

  constructor(depth: number, sequence: number, text: string, opts: any = {}) {
    this.depth = depth;
    this.sequence = sequence;
    this.text = text;
    this.opts = opts;
    this.parent = null;
    this.headings = null;
  }

  add_heading(heading: Heading) {
    heading.parent = this;
    if (this.headings === null) {
      this.headings = [];
    }
    this.headings.push(heading);
  }

  data() {
    let result: any = {
      depth: this.depth,
      sequence: this.sequence,
      text: this.text,
      ...this.opts,
      headings: this.headings?.map((h) => h.data()),
    };
    return result;
  }
}

Run-time usage code

let filterHeadings = headings
  .filter((heading) => heading.depth === 2 || heading.depth === 3)
  .map((heading, index) => new Heading(heading.depth, index, heading.text, { slug: heading.slug } ));

const toc = new CreateToc(filterHeadings);
toc.process();
let tocHeadings = toc.data();

CreateToc.ts

import type Heading from './heading';

export default class CreateToc {
  headings: Heading[];
  hierarchies: Heading[];
  hierarchy_root: Heading | null;
  current_heading: Heading | null;

  constructor(headings: Heading[]) {
    this.headings = headings;
    this.hierarchies = [];
    this.hierarchy_root = null;
    this.current_heading = null;
  }

  process() {
    this.headings.forEach((heading) => {
      this.process_heading(heading);
    });
  }

  data() {
    return this.hierarchies.map((h) => h.data());
  }

  private process_heading(heading: Heading) {
    if (this.hierarchy_root === null) {
      return this.new_hierarchy(heading);
    }
    if (heading.depth > this.current_heading!.depth) {
      return this.add_child_heading(heading);
    }
    if (heading.depth === this.current_heading!.depth) {
      return this.add_sibling_heading(heading);
    }

    // When an up-stream heading is encountered, you have to navigate up to the existing
    // hierarchy or create a new hierarchy out side of the current hierarchy
    if (heading.depth <= this.hierarchy_root.depth) {
      return this.new_hierarchy(heading);
    }
    if (heading.depth < this.current_heading!.depth) {
      return this.add_upstream_heading(heading);
    }
  }

  private new_hierarchy(heading: Heading) {
    this.hierarchies.push(heading);
    this.hierarchy_root = heading;
    this.current_heading = heading;
  }

  private add_child_heading(heading: Heading) {
    this.current_heading!.add_heading(heading);
    this.current_heading = heading;
  }

  private add_sibling_heading(heading: Heading) {
    if (this.current_heading!.parent === null) {
      return;
    }
    this.current_heading = this.current_heading!.parent;
    this.add_child_heading(heading);
  }

  private add_upstream_heading(heading: Heading) {
    while (this.current_heading !== null && heading.depth < this.current_heading.depth) {
      if (this.current_heading.parent === null) {
        throw new Error("current_heading.parent is null");
      }
      this.current_heading = this.current_heading.parent;
    }
    this.add_sibling_heading(heading);
  }
}

CodePudding user response:

That seems like a lot of code for the problem.

I would fold the list of nodes into an array of nodes still available to add children to, sorted by depth. We start with an initial node of depth 0. At the end we can simply extract the children of the first node.

If we don't mind keeping depth properties on your node, that's all it takes. If we really don't want them, then this can be wrapped in a simple function that recursively removes these properties.

Here's one version:

const removeDepth = (xs) => xs .map (
  ({headings = [], depth, ...rest}) => ({...rest, headings: removeDepth(headings)})
)
const makeTree = (xs) => removeDepth (xs .reduce (
  (hs, {depth, text}) => {
    const idx = hs .findLastIndex (x => x .depth < depth)
    const node = {text, headings: [], depth}
    hs [idx] .headings .push (node)
    return [...hs.slice (0, idx   1), node]
  },
  [{text: null, headings: [], depth: 0}]
) [0] .headings)


const singleRoot = [{depth: 1, text: 'Main heading'}, {depth: 2, text: 'Sub heading 1'}, {depth: 3, text: 'More info'}, {depth: 2, text: 'Sub heading 2'}, {depth: 3, text: 'Even more info'}, {depth: 2, text: 'Sub heading 3 (edge case)'}, {depth: 6, text: 'Deep nesting'}]
const multipleRoot = [{depth: 2, text: 'Why is it done'}, {depth: 3, text: 'Why abc'}, {depth: 3, text: 'Why xyz'}, {depth: 2, text: 'How is it done'}, {depth: 3, text: 'How reason 1'}, {depth: 3, text: 'How reason 2'}, {depth: 4, text: 'More info'}, {depth: 2, text: 'Conclusion'}]
const nonCon = [{depth: 4, text: 'Home -> Blog -> Some Articles'}, {depth: 3, text: 'By Ben Hurr'}, {depth: 4, text: '24th, Sep, 2022'}, {depth: 1, text: 'Some cool Article'}, {depth: 2, text: 'Why abc'}, {depth: 3, text: 'info on why'}, {depth: 3, text: 'more info on why'}, {depth: 2, text: 'How'}, {depth: 3, text: 'How we did it'}, {depth: 2, text: 'Conclusion'}]

console .log ('Single root: ', makeTree (singleRoot))
console .log ('Multiple root: ', makeTree (multipleRoot))
console .log ('Non-conventional headings list: ', makeTree (nonCon))
.as-console-wrapper {max-height: 100% !important; top: 0}

The real trick is maintaining the current list of open arrays. We maintain it by finding the last node in the list with a lower depth than our current node, adding our current node as a child of that one, cutting the list at that point, and adding our current node to it. Here's how that array goes as we add node after node from one of your examples:

//Initial list: 
[
  {depth: 0, headings: [], text: null}
]

// After breadcrumbs (i.e. "Home -> Blog -> Some Articles"): (4)
[
  {depth: 0, headings: [<breadcrumbs>], text: null}
  {depth: 4, headings: [], text: "Home -> Blog -> Some Articles"}
]

// After "By Ben Hurr": (3):
[
  {depth: 0, headings: [<breadcrumbs>, <By Ben>], text: null}
  {depth: 3, headings: [], text: "By Ben Hurr"}
]


// After "24th Sept" (4):
[
  {depth: 0, headings: [<breadcrumbs>, <By Ben>], text: null}
  {depth: 3, headings: [<24th Sept>], text: "By Ben Hurr"}
  {depth: 4, headings: [], text: "24th, Sep, 2022"}
]


// After "Some Cool Article" (1):
[
  {depth: 0, headings: [<breadcrumbs>, <By Ben>, <Some Cool>], text: null}
  {depth: 1, headings: [], text: 'Some Cool Article'}
]

// After "Why ABC" (2):
[
  {depth: 0, headings: [<breadcrumbs>, <By Ben>, <Some Cool>], text: null}
  {depth: 1, headings: [<Why abc>], text: 'Some Cool Article'}
  {depth: 2, headings: [], text: 'Why abc'}
]

// ...

// Finally

// After "Conclusion" (2):
[
  {depth: 0, headings: [<breadcrumbs>, <By Ben>, <Some Cool>], text: null}
  {depth: 1, headings: [<Why abc>, <How>, <Conclusion>], text: 'Some Cool Article'}
  {depth: 2, headings: [], text: 'Conclusion'}
]

I think this is significantly simpler than your approach, but I must admit to not having read your approach carefully; it simply seemed too much code for an interesting, but not overly complex, problem.

  • Related