Home > Software engineering >  How to match lines in a numbered list with a regex
How to match lines in a numbered list with a regex

Time:01-06

I want to search for all lines that:

  • start with a numeric-repeat (one or several times)
  • this numeric-repeat is not followed by dot and a whitespace character
  • either a single dot after the numeric-repeat or a letter is okay

Given Lines

1. TEST 1 : DataLogFile
11. TEST 2 : Inter Citro File
111. TEST 3 : Inter Citro File
111.TEST4 : Match this
111TEST4 : Match this

Expected Result

Should only match last 2 lines

111.TEST4 : Match this
111TEST4 : Match this

1. Regex

I try with regex ^[0-9] (?!. ).* to match only the last row because there is no whitespace character after the dot.

Tested in Regex101

1. Actual Result

Matched 4 last lines

11. TEST 2 : Inter Citro File
111. TEST 3 : Inter Citro File
111.TEST4 : Match this
111TEST4 : Match this

2. Regex like answered

When I try the SaSkY first response ^\d \.\S.*, it will only match lines that have digits, then dot, then no blank, then characters. See Demo

But for input without a dot after digits it will not match. Although expected to match also 111TEST4 : Match this.

CodePudding user response:

Try this:

^\d (?:\.\S|[A-Z]).*
  • ^ start of the line.

  • \d one or more digits.

  • (?:\.\S|[A-Z]) non-capturing group:

    • \. a literal dot ..
    • \S any character except a whitespace character.
    • | OR.
    • [A-Z] a capital letter.
  • .* zero or more characters.

See regex demo

CodePudding user response:

You can try:

^(\d)\1* (?!\.?\s ).*$

Regex demo.


Or if you want just a number at the beginning (not repeating numbers such as 111):

^\d  (?!\.?\s ).*$

CodePudding user response:

You should have stated your expectations clearly before asking.

If you like to

  • match: any "identifier" or word that is either prefixed with a number (e.g. 1Hello) or is prefixed with an ordinal (e.g. 2.World)
  • But not: a phrase containing space like in a numbered list entry (e.g. 1. Hello

Simple regex sequentially built

Then ^\d \.?[a-zA-Z].*

Matches:

111.TEST4 : Match this
111TEST5: Match this
111test6: Match this

But not those numbered-list items having separating spaces inside. It also does not match anything starting with a letter. Those do not match:

1. TEST 1 : DataLogFile
11. TEST 2 : Inter Citro File
111. TEST 3 : Inter Citro File
test7: should not match

  • Related