Home > front end >  Match everything except certain strings
Match everything except certain strings

Time:10-07

I am trying to write python script to match a regex that can include everything which has two - and one . but I also want to exclude two strings from it. They are NIST-Privacy-v1.1 and NIST-CSF-v1.1

Here is my sample data:

NIST-Privacy-v1.1
NIST-CSF-v1.1
AWS-CIS-v1.4-1.10
SOC-2-CC6.8
NIST-800-53rev5
NIST-800-53rev5-CM-3(1)
NIST-800-53rev5-AU-12(4)
SOC-2-CC6.1
NISTPrivacyFramework-v1.0
NISTPrivacyFramework-v1.0-PR.AC-P1
NIST-800-53rev5-AC-1
NIST-800-53rev5-IA-1

I started with a very simple regex which does the job of matching what I need but doesn't exclude the two strings. Can you help me identify the exclusion part.

regex: .*-.*-.*[.|\-].*

Desired output:

AWS-CIS-v1.4-1.10
SOC-2-CC6.8
NIST-800-53rev5-CM-3(1)
NIST-800-53rev5-AU-12(4)
SOC-2-CC6.1
NISTPrivacyFramework-v1.0-PR.AC-P1
NIST-800-53rev5-AC-1
NIST-800-53rev5-IA-1

CodePudding user response:

^(?!NIST-Privacy-v1\.1)(?!NIST-CSF-v1\.1).*-.*-.*[.-].*$

Output:

AWS-CIS-v1.4-1.10
SOC-2-CC6.8
NIST-800-53rev5-CM-3(1)
NIST-800-53rev5-AU-12(4)
SOC-2-CC6.1
NISTPrivacyFramework-v1.0-PR.AC-P1
NIST-800-53rev5-AC-1
NIST-800-53rev5-IA-1

Demo: https://regex101.com/r/gM9e44/1

  • ^ => Given pattern must start from the beginning of the line
  • [.-] => "-" or "."
  • ^(?!NIST-Privacy-v1\.1) => It must not start with "NIST-Privacy-v1.1"
  • ^(?!NIST-Privacy-v1\.1)(?!NIST-CSF-v1\.1) => It must not start with "NIST-Privacy-v1.1" or "NIST-CSF-v1.1"
  • $ => Given pattern must finish at the end of the line

CodePudding user response:

You may use this regex for your job:

^(?!NIST-(?:CSF|Privacy)-v1\.1$)(?:[^-]*-){2}.*[.-].*

RegEx Demo

RegEx Breakup:

  • ^: Start
  • (?!NIST-(?:CSF|Privacy)-v1\.1,$): Negative lookahead to fail to match when input is NIST-Privacy-v1.1 or NIST-CSF-v1.1
  • (?:[^-]*-){2}: Match 0 or more of non-hyphen characters followed by a hyphen. Repeat this group 2 times
  • .*[.-]: Match any text followed by dot or hyphen
  • .*: Match 0 or more of any text
  • Related