Home > database >  Get headers from markdown using regex
Get headers from markdown using regex

Time:11-30

I'm trying to get only h1 and h2 headers from markdown file using regex, but unfortunately I don't know regex well and can't write the correct solution.

With this expression I'm near the solution (I think so): /\#{1,2} (.*?)(\\r\\n|\\r|\\n)/gm

But it returns also headers with more than two hashes.

Test case:

# first \r
## second \r
### third \r## fourth \r

This should return ['first', 'second', 'fourth']

CodePudding user response:

Use

/(?<!#)#{1,2} (.*?)(\\r(?:\\n)?|\\n)/gm

See regex proof.

EXPLANATION

--------------------------------------------------------------------------------
  (?<!                     look behind to see if there is not:
--------------------------------------------------------------------------------
    #                        '#'
--------------------------------------------------------------------------------
  )                        end of look-behind
--------------------------------------------------------------------------------
  #{1,2}                   '#' (between 1 and 2 times (matching the
                           most amount possible))
--------------------------------------------------------------------------------
                           ' '
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    .*?                      any character except \n (0 or more times
                             (matching the least amount possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  (                        group and capture to \2:
--------------------------------------------------------------------------------
    \\                       '\'
--------------------------------------------------------------------------------
    r                        'r'
--------------------------------------------------------------------------------
    (?:                      group, but do not capture (optional
                             (matching the most amount possible)):
--------------------------------------------------------------------------------
      \\                       '\'
--------------------------------------------------------------------------------
      n                        'n'
--------------------------------------------------------------------------------
    )?                       end of grouping
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    \\                       '\'
--------------------------------------------------------------------------------
    n                        'n'
--------------------------------------------------------------------------------
  )                        end of \2
  • Related