Home > Software design >  How to extract parameter definitions using regex?
How to extract parameter definitions using regex?

Time:08-24

I am trying to extract parameter definitions from a Jenkins script and can't work out an appropriate regex (I' working in Dyalog APL which supports PCRE8).

Here's how the subject looks like:

pipeline {                                 
  agent none                                  
  parameters {                                
     string(name: 'foo', defaultValue: 'bar')
     string(name: 'goo', defaultValue: 'hoo')
  }                                           
  stages {                                    
    stage('action') {                       
      steps {                             
        echo "foo = ${params.foo}"      
      }                                   
    }                                       
  }                                           
}                                               

I would like to get the individual param definitions captured in group 1 (in other words: I'm looking for a results that reports two matches: string(name: 'foo', defaultValue: 'bar') and string(name: 'goo', defaultValue: 'hoo') ), but the matches are either too long or too short (depending on greediness).

My regex: parameters\s*{(\s*\D*\(.*\)\s*)*} (dot matches nl)

Parameter types may vary, so my best idea was to use \D* for those (any # of non-digits). I am suspicious that this captures more than I expected - but replacing that with \w did not help.

An alternative idea was parameters\s*{(\s*(\w*)\(([^\)]*)\))*\s*}

which seemed more precise wrt matching parameter types and also the content of the parens - but surprisingly that returned goo only and skipped foo.

What am I missing?

CodePudding user response:

Using PCRE you can use this regex in MULTILINE mode:

(?m)(?:^\h*parameters\h*{|(?!^)\G).*\R\h*\w \(\w :\h*'\K[^'] 

RegEx Demo

RegEx Details:

  • (?m): Enable MULTILINE mode
  • (?:: Start non-capture group
    • ^\h*parameters\h*{: Match a line that starts with parameters {
    • |: OR
    • (?!^)\G:
  • ): End non-capture group
  • .*: Match anything
  • \R: Match a line break
  • \h*: Match 0 or more whitespaces
  • \w : Match 1 word chars
  • \(: Match (
  • \w : Match 1 word chars
  • :: Match a :
  • \h*: Match 0 or more whitespaces
  • ': Match a '
  • \K: Reset all the matched info
  • [^'] : Match 1 of any char that is not ' (this is our parameter name)
  • Related