Home > Enterprise >  How to parse html to markdown from and a reverse action, parse markdown to html in Angular?
How to parse html to markdown from and a reverse action, parse markdown to html in Angular?

Time:09-14

I need a parser, that can parse a text from Angular Editor, which is a string field filled with html, to markdown.

And I need a reverse action, that can parse markdown text to string field with html.

Thank you in advance.

CodePudding user response:

Narek.

I had the same problem related with parsing string with html to markdown, there were a few libraries with the ability to parse only in one direction, and then they did not parse all the elements.

After a lot of searching and disappointment I desided to create service that can do two these actions html => markdown and markdown => html.

Here the service I have created for my project, but maybe it can help you too.

import { Injectable } from '@angular/core';
import * as Markdown from 'marked';

@Injectable()
export class MarkdownHtmlParserService {

  public parseHtmlToMarkdown(html: string): string {
    if (!html) {
      return '';
    }
    html = this.setBreaksToHtml(html);

    let markdown = html;
    let snipped = document.createElement('div');
    snipped.innerHTML = markdown;
    let links = snipped.getElementsByTagName('a');
    let markdownLinks = [];
    for (let i = 0; i < links.length; i  ) {
      if (links[i]) {
        let marked = `[${links[i].innerText}](${links[i].href})`;
        markdown = markdown.replace(links[i].outerHTML, marked);
        markdownLinks[i] = marked;
      }
    }

    markdown = markdown.replace(/<h1>/g, '# ').replace(/<\/h1>/g, '');
    markdown = markdown.replace(/<h2>/g, '## ').replace(/<\/h1>/g, '');
    markdown = markdown.replace(/<h3>/g, '### ').replace(/<\/h1>/g, '');
    markdown = markdown.replace(/<h4>/g, '#### ').replace(/<\/h1>/g, '');
    markdown = this.parseAll(markdown, 'strong', '**');
    markdown = this.parseAll(markdown, 'b', '**');
    markdown = this.parseAll(markdown, 'em', '__');
    markdown = this.parseAll(markdown, 'i', '__');
    markdown = this.parseAll(markdown, 's', '~~');
    markdown = markdown.replace(/<p><br><\/p>/g, '\n');
    markdown = markdown.replace(/<br>/g, '\n');
    markdown = markdown.replace(/<p>/g, '').replace(/<\/p>/g, '  \n');
    markdown = markdown.replace(/<div>/g, '').replace(/<\/div>/g, '  \n');
    markdown = markdown
      .replace(/<blockquote>/g, '> ')
      .replace(/<\/blockquote>/g, '');

    markdown = this.parseList(markdown, 'ol', '1.');
    markdown = this.parseList(markdown, 'ul', '-');

    return markdown;
  }

  public parseMarkdownToHtml(markdown: string): string {
    markdown = this.setItalicSymbols(markdown);
    return Markdown.parse(markdown);
  }

  private setItalicSymbols(markdown: string): string {
    let regex = /\__(.*?)\__/g;
    let match;
    do {
      if (match) {
        markdown = markdown.replace(match[0], '<i>'   match[1]   '</i>');
      }
      match = regex.exec(markdown);
    } while (match);
    return markdown;
  }

  private parseAll(html: string, htmlTag: string, markdownEquivalent: string) 
  {
    const regEx = new RegExp(`<\/?${htmlTag}>`, 'g');
    return html.replace(regEx, markdownEquivalent);
  }

  private parseList(
    html: string,
    listType: 'ol' | 'ul',
    identifier: string
  ): string {
    let parsedHtml = html;

    const getNextListRegEx = new RegExp(`<${listType}>. ?<\/${listType}>`);

    while (parsedHtml.match(getNextListRegEx) !== null) {
      const matchedList = parsedHtml.match(getNextListRegEx);

      const elements = this.htmlToElements(matchedList);
      const listItems = [];

      elements[0].childNodes.forEach((listItem) => {
        let parsedListItem = `${identifier} ${listItem.textContent}`;

        // @ts-ignore
        const className = listItem.className;
        if (className) {
          const splittedClassName = className.split('-');
          const numberOfLevel = parseInt(
            splittedClassName[splittedClassName.length - 1] || 0
          );

          for (let i = 0; i < numberOfLevel; i  ) {
            parsedListItem = `   ${parsedListItem}`;
          }
        }

        listItems.push(parsedListItem);
      });

      parsedHtml = parsedHtml.replace(
        getNextListRegEx,
        listItems.join('\n')   '\n\n'
      );
    }

    return parsedHtml;
  }

  private htmlToElements(html) {
    var template = document.createElement('template');
    template.innerHTML = html;
    return template.content.childNodes;
  }

  private setBreaksToHtml(html: string): string {
    return html.replace(/<p>/g, '<br> ').replace(/<\/p>/g, '');
  }
}

Only library you need to install is marked, find a version that matches your version of Angular.

There are two extra functions setItalicSymbols() and setBreaksToHtml() I have created, because in my case AngularEditor didn't pass to new line seeing <p></p> tags to \n, so before parsing to markdown I have called setBreaksToHtml(). Marked doesn't parse text between two underscores (example'__text__') to text betweenor` tags, so I'm calling setItalicSymbols() before parsing with marked.parse() function.

Hope you will find it useful.

  • Related