Home > Software design >  Regex table of contents
Regex table of contents

Time:12-11

I have a table of contents items I would need to regex. The data is not totally uniform and I cant get it to work in all cases.

Data is following:

1.     Header 1
1.2.  SubHeader2
1.2.1     Subheader 
1.2.2.   Another header
1.2.2.1        Test
1.2.2.2.    Test2

So I would need to get both the number and the header in different groups. The number should be without the trailing dot, if it is there. The issue that im struggling with is that not all of the numbers have the trailing dot.

I have tried

^([0-9\.] )[\.]\s (. )$      -- Doesnt work when there is no trailing
^([0-9\.] )[\.]?\s (. )$     -- Contains the trailing dot if it is there 

CodePudding user response:

You can use

^(\d (?:\.\d )*)\.?\s (. )

See the regex demo. Details:

  • ^ - start of string
  • (\d (?:\.\d )*) - Group 1: one or more digits and then zero or more repetitions of a . and one or more digits sequence
  • \.? - an optional .
  • \s - one or more whitespaces
  • (. ) - Group 2: any one or more chars other than line break chars, as many as possible.
  • Related