I would like to extract the ISO and ASTM standards from a text. The corresponding literals ISO and ASTM followed by the numbers would have to be found.
Rules:
- Match starts with ISO or ASTM
- ASTM is followed by a D
- This is followed by a number (either preceded or not with a space or hyphen) that can also contain optional spaces and hyphens
- As soon as the number sequence ends, the match ends
Possible pattern for the first two rules:
(?:ISO|ASTM\s*D)
Example:
ISO 527-1, DIN EN ISO 3349-3, and ASTM D143 are all testing standards. ISO 31 33, ISO 334 9 are specific to static bending, but ASTM D 149-3 includes various other 9.
https://regex101.com/r/IFlqT2/1
What would a corresponding regex look like?
CodePudding user response:
You can use
(?:ISO|ASTM\s*D)(?:[\s-]*\d)
Details:
(?:ISO|ASTM\s*D)
-ISO
orASTM
zero or more whitespacesD
(?:[\s-]*\d)
- one or more repetitions of[\s-]*
- zero or more whitespaces or hyphens\d
- a digit.
See the regex demo.