javascript regex for finding text several lines before the match-CodePudding

I am trying to create a regex that finds text in a markdown file. Basically, I have "tasks" marked with the - [ ] or - [x] characters (undone or done) and project headers (marked with ##). I would like to find all undone tasks and their project names.

For example, for this sample text:

# Top of File

## Project A
Descriptive line

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec finibus elit non nibh lobortis molestie.

- [ ] an undone task
- [x] a completed task
- [x] second completed task

## Project B
Descriptive line

- [x] a completed task
- [ ] an uncompleted task

## Project C
Descriptive line

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec finibus elit non nibh lobortis molestie.

- [x] completed task
- [ ] uncompleted task
- [x] completed task

I would like to return:

Project A, an undone task
Project B, an uncompleted task
Project C, uncompleted task

This maybe gets close, but I will have variable amounts of tasks and the regex wants to know how many lines to look ahead, and it's too variable. ((.*(\n|\r|\r\n)){5})\- \[ \]

CodePudding user response：

We can try using match() here to alternatively find the project or incomplete lines. Then, do a reduction to combine the two matching lines by a comma separator.

var input = `# Top of File

## Project A
Descriptive line

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec finibus elit non nibh lobortis molestie.

- [ ] an undone task
- [x] a completed task
- [x] second completed task

## Project B
Descriptive line

- [x] a completed task
- [ ] an uncompleted task

## Project C
Descriptive line

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec finibus elit non nibh lobortis molestie.

- [x] completed task
- [ ] uncompleted task
- [x] completed task`;

var lines = input.match(/## (.*)|- \[ \] (.*)/g)
                 .map(x => x.match(/\w (?: \w )*/g)[0]);
var output = [];
var i=0;
while (i < lines.length) {
    output.push(lines[i]   ", "   lines[i 1]);
    i  = 2;
}

console.log(output);

Here is an explanation of the regex pattern used to find the matching lines:

## (.*) match and capture the project text
| OR
- \[ \] (.*) match and capture the incomplete text

But the match() function will return the leading portion (e.g. ##) which we don't want. So I added a map() step which removes this leading content. Finally, we iterate the array of lines and combine in order with a comma.