I have a large text from which heading strings have been extracted using regex. The headings start with from 1-6 hashtags (#). Here is an example of the input array:
const content = [
"#1",
"##1a",
"###1a1",
"###1a2",
"##1b",
"#2",
"#3",
"##3a",
"##3b",
"#4",
];
The heading level (number of hashtags at the beginning of the string) describes where in the hierarchy of the chapters a certain heading is. I would like to parse my input into an array of heading objects, which contain the heading text without the hashtag and the heading nested chapters. The desired output for the array above is:
export interface Heading {
chapters: Heading[];
text: string;
}
const headings: Heading[] = [
{
text: "1",
chapters: [
{
text: "1a",
chapters: [
{ text: "1a1", chapters: [] },
{ text: "1a2", chapters: [] },
],
},
{ text: "1b", chapters: [] },
],
},
{ text: "2", chapters: [] },
{
text: "3",
chapters: [
{ text: "3a", chapters: [] },
{ text: "3b", chapters: [] },
],
},
{ text: "4", chapters: [] },
];
I tried writing a function that parses the string but got stuck on how to know which heading output the current string belongs to:
export const getHeadings = (content: string[]): Heading[] => {
let headingLevel = 2;
let headingIndex = 0;
const allHeadings = content.reduce((acc, currentHeading) => {
const hashTagsCount = countHastags(currentHeading);
const sanitizedHeading = currentHeading.replace(/#/g, "").trim();
const heading = {
chapters: [],
text: sanitizedHeading,
};
if (hashTagsCount === headingLevel) {
headingIndex = headingIndex 1;
} else {
headingIndex = 0;
}
headingLevel = hashTagsCount;
if (hashTagsCount === 2) {
acc.push(heading);
} else if (hashTagsCount === 3) {
if (acc.length === 0) {
return acc;
}
if (acc.length === 1) {
acc[acc.length - 1]["chapters"].push(heading);
}
} else if (acc.length === 2) {
acc[acc.length - 1]["chapters"][headingIndex]["chapters"].push(heading);
} else if (acc.length === 3) {
acc[acc.length - 1]["chapters"][headingIndex]["chapters"][headingIndex][
"chapters"
].push(heading);
}
return acc;
}, []);
return allHeadings;
};
While this works for a very simple case, it is not scalable and has a predefined level of headings (with the if statements). How can I rewrite this in a way that the number of levels (hashtags) does not matter?
CodePudding user response:
With a reduce
based approach one can keep tracing/managing the correct (nested) chapters
arrays where one needs to push a new chapter item into.
Thus the accumulator can be an object which in addition to the result
array features an index/map for the to be traced nested level chapters
arrays.
The to be reduced heading
string gets decomposed into its '#'
(hash) based flag
and its text content
part. This is done with the help of following regex ... /^(?<flag># )\s*(?<content>.*?)\s*$/
... which features named capturing groups. The amount of hashes (flag.length
) indicates the current nested level.
function traceAndAggregateChapterHierarchy({ chaptersMap = {}, result }, heading) {
const {
flag = '',
content = '',
} = (/^(?<flag># )\s*(?<content>.*?)\s*$/)
.exec(heading)
?.groups ?? {};
const nestingLevel = flag.length;
// ensure a valid `heading` format.
if (nestingLevel >= 1) {
let chapters;
if (nestingLevel === 1) {
// reset map.
chaptersMap = {};
// level-1 chapter items need to be pushed into `result`.
chapters = result;
} else {
// create/access the deep nesting level specific `chapters` array.
chapters = (chaptersMap[nestingLevel] ??= []);
}
// create a new chapter item.
const chapterItem = {
text: content || '$$ missing header content $$',
chapters: [] ,
};
// create/reassign the next level's `chapters` array.
chaptersMap[nestingLevel 1] = chapterItem.chapters;
// push new item into the correct `chapters` array.
chapters.push(chapterItem);
}
return { chaptersMap, result };
}
const content = [
"# The quick brown (1) ",
"## fox jumps (1a)",
"###over (1a1)",
"#### ",
"###the (1a2)",
"## lazy dog (1b)",
"# Foo bar (2)",
"# Baz biz (3)",
"##buzz (3a) ",
"##booz (3b) ",
"# Lorem ipsum (4) ",
"##",
];
const { result: headings } = content
.reduce(traceAndAggregateChapterHierarchy, { result: [] });
console.log({ content, headings });
.as-console-wrapper { min-height: 100%!important; top: 0; }
CodePudding user response:
A short solution without mutable state :)
Works by recursively removing the first #
and grouping the headings.
const content = [
"#1",
"##1a",
"###1a1",
"###1a2",
"##1b",
"#2",
"#3",
"##3a",
"##3b",
"#4",
];
const getNesting = (arr) =>
arr
.map((str) => str.slice(1)) // remove first #
.reduce((acc, cur) =>
// group heading level
cur.match(/^#/)
? [acc[0].concat(cur), ...acc.splice(1)]
: [[cur], ...acc],
[]
)
.map(([text, ...subh]) => ({
// recursive call
text,
chapters: !!subh ? getNesting(subh) : [],
}));
console.log(JSON.stringify(getNesting(content)));