Home > other >  Using Regex, Python
Using Regex, Python

Time:11-25

I'm trying to write a regex code so that it can fit the following criteria:

  • CS 1110: "Introduction to Programming"
  • ENGR 1624: "Introduction to Engineering"
  • BME 2220: "Biomechanics"

should all match.

  • CS 20: "Introduction to CS"
  • ENGR 1624: " "
  • ENGR 1624: ""

should not match.

This is my code so far:

([A-Z]{2,4})\s([1000-4000]{4})(:)\s(["][a-zA-Z]*\s[a-zA-Z]*?\s[a-zA-Z]*["])

However I'm running into two problems:

  1. When I try to run ENGR 1624, it is not working (I assume because the [1000-4000]{4} part of my code is wrong)
  2. It will not work for just the one word "Biomechanics"

Can anyone help fix my code please???

CodePudding user response:

If you don't want to match an empty string between the last parenthesis, you can repeat the character class 1 or more times [a-zA-Z] and optionally repeat a group starting with a space and again the character class.

About the notations in the pattern, the " does not have to be between square brackets, the character class notation [1000-4000]{4} is not a range, it repeats 4 times any of 0 1 - and 4

A range from 1000-4000 can be written as (?:4000|[1-3][0-9]{3}) which matches either 4000 or a range from 1000 - 3999

You might update the pattern using 3 capture groups instead:

\b([A-Z]{2,4})\s(4000|[1-3][0-9]{3}):\s("[a-zA-Z] (?:\s[a-zA-Z] )*")

Regex demo | Python demo

For example

import re

pattern = r'\b([A-Z]{2,4})\s(4000|[1-3][0-9]{3}):\s("[a-zA-Z] (?:\s[a-zA-Z] )*")'

s = ("CS 1110: \"Introduction to Programming\", ENGR 1624: \"Introduction to\n"
    "Engineering\", and BME 2220: \"Biomechanics\"\n\n"
    "CS 20: \"Introduction to CS\", ENGR 1624: \" \", and ENGR 1624: \"\"")

print(re.findall(pattern, s))

Output

[('CS', '1110', '"Introduction to Programming"'), ('ENGR', '1624', '"Introduction to\nEngineering"'), ('BME', '2220', '"Biomechanics"')]
  • Related