Home > Software design >  Regex - named groups multiple match by patterns
Regex - named groups multiple match by patterns

Time:05-11

Below is a string pattern from which I would like to extract some information for an analytics system. PRMS: {"load_numbers"=>"12345678", "shipper"=>"some-shipper", "app_id"=>"my_app_id", "timestamp"=>"20220502081520", "action"=>"index", "referencenumbers"=>["12342", "22342", "32323"]}

I would like to extract values of the app_id, referencenumbers, shipper (value after => & between the quotes). I wrote a named REGEX101 to extract them:

(?<loadnumbers>\"load_numbers\"=>\"*\"(.*?)\")|(?<appid>\"app_id\"=>\"([^\s] )\")|(?<shipper>\"shipper\"=>\"(.*?)\")|(?<referenceNumbers>\"referenceNumbers\"|\"ReferenceNumbers\"=>\[(.*?)\])|(?<carrier>\"carrierName\"=>\"(.*?)\")|(?<trackingid>\"tracking_id|\"trackingid\"=>\"(.*?)\")

However, the values are coming in the named-group with keys and the values move to the next group. Note, not all the values may be in the given string, extract only what is available. Mostly the order of the appearance of the values will be same. How do I fix this?

CodePudding user response:

You need to put the group name where you actually capture the value, so for example:

(?:\"load_numbers\"=>\"*\"(?<loadnumbers>[^\"]*)\")

Your complete regex (with some simplification and optimisations) then becomes:

(?:\"load_numbers\"=>\"*\"(?<loadnumbers>[^\"]*)\")|
(?:\"app_?id\"=>\"(?<appid>\S )\")|
(?:\"shipper\"=>\"(?<shipper>[^\"]*)\")|
(?:\"[Rr]eference[nN]umbers\"=>\[(?<referenceNumbers>[^]]*)\])|
(?:\"carrierName\"=>\"(?<carrier>[^\"]*)\")|
(?:\"tracking_?id\"=>\"(?<trackingid>[^\"]*)\")

Demo on regex101

  • Related