Home > Software design >  Regex matching everything in a sentence starting with a number and a period and another period
Regex matching everything in a sentence starting with a number and a period and another period

Time:11-29

So I have this text:

enter image description here

Using JavaScript, I am trying to develop a regex that matches the first 4 sentences, including any newline characters.

The closest I got to is /([0-9]) .*/gm but it's incomplete; It ignores the 2nd part of the sentence starting with "4." because of the \n character between the words completed and tasks. It only matches the part in blue on the screenshot.

Any ideas on how to include:

"tasks and the last task that was submitted."

in the match?


Edit 1: Here's the text in the screenshot:

  1. It creates a directory for the log file if it doesn't exist.

  2. It checks that the log file is newline-terminated.

  3. It writes a newline-terminated JSON object to the log file.

  4. It reads the log file and returns a dictionary with the set of completed

    tasks and the last task that was submitted.

Using /^\d \.[\w\W]*?(?=\n\n|\n\d \.)/gm as suggested in the 1st comment by @Wiktor Stribizew, works. However using this sample text, it doesn't (it's skipping the last line):

  1. The extension is activated the very first time the command is executed
  2. The command handler parses the user's selection and calls the explain function
  3. The explain function returns a promise that resolves to the explanation
  4. The command handler displays the explanation in a message box

enter image description here


Edit 2: To clarify, the regex needs to match any number of sentences starting with a number and a period and ending with a period. Sometimes it could be 4, and sometimes it could anything up to 20.


CodePudding user response:

I would phrase the regex as (?:.*?(?:\n|$)){1,4}:

var input = `1. Line One.
2. Line 2.
3. Line 3.
4. Lime 4.
blah blah blah`;

var lines = input.match(/(?:.*?(?:\n|$)){1,4}/)[0];
console.log(lines);

CodePudding user response:

You can use regex ^\d \.[^\.] \./gm to match only lines that start with a digit, up to first dot, even if the dot is on subsequent line(s):

const input = `Some preamble stuff to ignore.
1. It creates a directory for the log file if it doesn't exist.
2. It checks that the log file is newline-terminated.
3. It writes a newline-terminated JSON object to the log file.
4. It reads the log file and returns a dictionary with the set of completed
   tasks and the last task that was submitted.
Some more stuff to ignore`;

let result = input.match(/^\d \.[^\.] \./gm);
console.log(result);

Output:

[
  "1. It creates a directory for the log file if it doesn't exist.",
  "2. It checks that the log file is newline-terminated.",
  "3. It writes a newline-terminated JSON object to the log file.",
  "4. It reads the log file and returns a dictionary with the set of completed\n   tasks and the last task that was submitted."
]

Explanation of regex:

  • ^ -- anchor at start of line
  • \d -- 1 digits
  • \. -- literal dot
  • [^\.] -- everything up to next dot
  • \. -- literal dot
  • gm -- g flag to match multiple times, m flag to match ^ at the beginning of a line
  • Related