I need a parser, that can parse a text from Angular Editor, which is a string field filled with html, to markdown.
And I need a reverse action, that can parse markdown text to string field with html.
Thank you in advance.
CodePudding user response:
Narek.
I had the same problem related with parsing string with html to markdown, there were a few libraries with the ability to parse only in one direction, and then they did not parse all the elements.
After a lot of searching and disappointment I desided to create service that can do two these actions html => markdown and markdown => html.
Here the service I have created for my project, but maybe it can help you too.
import { Injectable } from '@angular/core';
import * as Markdown from 'marked';
@Injectable()
export class MarkdownHtmlParserService {
public parseHtmlToMarkdown(html: string): string {
if (!html) {
return '';
}
html = this.setBreaksToHtml(html);
let markdown = html;
let snipped = document.createElement('div');
snipped.innerHTML = markdown;
let links = snipped.getElementsByTagName('a');
let markdownLinks = [];
for (let i = 0; i < links.length; i ) {
if (links[i]) {
let marked = `[${links[i].innerText}](${links[i].href})`;
markdown = markdown.replace(links[i].outerHTML, marked);
markdownLinks[i] = marked;
}
}
markdown = markdown.replace(/<h1>/g, '# ').replace(/<\/h1>/g, '');
markdown = markdown.replace(/<h2>/g, '## ').replace(/<\/h1>/g, '');
markdown = markdown.replace(/<h3>/g, '### ').replace(/<\/h1>/g, '');
markdown = markdown.replace(/<h4>/g, '#### ').replace(/<\/h1>/g, '');
markdown = this.parseAll(markdown, 'strong', '**');
markdown = this.parseAll(markdown, 'b', '**');
markdown = this.parseAll(markdown, 'em', '__');
markdown = this.parseAll(markdown, 'i', '__');
markdown = this.parseAll(markdown, 's', '~~');
markdown = markdown.replace(/<p><br><\/p>/g, '\n');
markdown = markdown.replace(/<br>/g, '\n');
markdown = markdown.replace(/<p>/g, '').replace(/<\/p>/g, ' \n');
markdown = markdown.replace(/<div>/g, '').replace(/<\/div>/g, ' \n');
markdown = markdown
.replace(/<blockquote>/g, '> ')
.replace(/<\/blockquote>/g, '');
markdown = this.parseList(markdown, 'ol', '1.');
markdown = this.parseList(markdown, 'ul', '-');
return markdown;
}
public parseMarkdownToHtml(markdown: string): string {
markdown = this.setItalicSymbols(markdown);
return Markdown.parse(markdown);
}
private setItalicSymbols(markdown: string): string {
let regex = /\__(.*?)\__/g;
let match;
do {
if (match) {
markdown = markdown.replace(match[0], '<i>' match[1] '</i>');
}
match = regex.exec(markdown);
} while (match);
return markdown;
}
private parseAll(html: string, htmlTag: string, markdownEquivalent: string)
{
const regEx = new RegExp(`<\/?${htmlTag}>`, 'g');
return html.replace(regEx, markdownEquivalent);
}
private parseList(
html: string,
listType: 'ol' | 'ul',
identifier: string
): string {
let parsedHtml = html;
const getNextListRegEx = new RegExp(`<${listType}>. ?<\/${listType}>`);
while (parsedHtml.match(getNextListRegEx) !== null) {
const matchedList = parsedHtml.match(getNextListRegEx);
const elements = this.htmlToElements(matchedList);
const listItems = [];
elements[0].childNodes.forEach((listItem) => {
let parsedListItem = `${identifier} ${listItem.textContent}`;
// @ts-ignore
const className = listItem.className;
if (className) {
const splittedClassName = className.split('-');
const numberOfLevel = parseInt(
splittedClassName[splittedClassName.length - 1] || 0
);
for (let i = 0; i < numberOfLevel; i ) {
parsedListItem = ` ${parsedListItem}`;
}
}
listItems.push(parsedListItem);
});
parsedHtml = parsedHtml.replace(
getNextListRegEx,
listItems.join('\n') '\n\n'
);
}
return parsedHtml;
}
private htmlToElements(html) {
var template = document.createElement('template');
template.innerHTML = html;
return template.content.childNodes;
}
private setBreaksToHtml(html: string): string {
return html.replace(/<p>/g, '<br> ').replace(/<\/p>/g, '');
}
}
Only library you need to install is marked, find a version that matches your version of Angular.
There are two extra functions setItalicSymbols() and setBreaksToHtml()
I have created, because in my case AngularEditor didn't pass to new line
seeing <p></p>
tags to \n
, so before parsing to markdown I have called setBreaksToHtml().
Marked doesn't parse text between two underscores (example'__text__') to text between
or
` tags, so I'm calling setItalicSymbols() before parsing with marked.parse() function.
Hope you will find it useful.