Home > database >  How can I extract ISO and ASTM standards from a text using regex?
How can I extract ISO and ASTM standards from a text using regex?

Time:05-06

I would like to extract the ISO and ASTM standards from a text. The corresponding literals ISO and ASTM followed by the numbers would have to be found.

Rules:

  • Match starts with ISO or ASTM
  • ASTM is followed by a D
  • This is followed by a number (either preceded or not with a space or hyphen) that can also contain optional spaces and hyphens
  • As soon as the number sequence ends, the match ends

Possible pattern for the first two rules:

(?:ISO|ASTM\s*D)

Example:

ISO 527-1, DIN EN ISO 3349-3, and ASTM D143 are all testing standards. ISO 31 33, ISO 334 9 are specific to static bending, but ASTM D 149-3 includes various other 9.

https://regex101.com/r/IFlqT2/1

What would a corresponding regex look like?

CodePudding user response:

You can use

(?:ISO|ASTM\s*D)(?:[\s-]*\d) 

Details:

  • (?:ISO|ASTM\s*D) - ISO or ASTM zero or more whitespaces D
  • (?:[\s-]*\d) - one or more repetitions of
    • [\s-]* - zero or more whitespaces or hyphens
    • \d - a digit.

See the regex demo.

  • Related