Home > Back-end >  Regular expression: Match everything after a particular word until multiple occurence of carriage re
Regular expression: Match everything after a particular word until multiple occurence of carriage re

Time:02-20

I am using Python and would like to match all the words after "Examination(s):" till one or more empty lines occur.

text = "Examination(s):\sMathematics 2nd Paper\r\n\r\nTimeTable"
text = "Examination(s):\r\n\r\nMathematics 2nd Paper\r\nblahblah"
text = "Examination(s):\r\nMathematics 2nd Paper\r\n\r\n\r\nmarks"

In all the above examples, my output should be "Mathematics 2nd Paper". Here is what I tried:

import re
pat = re.compile(r'(?:Examination\(s\):)[^\r\n]*')
re.search(pat,text)

The above snippet works fine for example 2 (one occurrence of \r\n), but is not working for examples 1 and 3.

I am getting this error when i tried to apply your pattern @Wiktor

enter image description here

Updating the question to capture the missed scenario, it can be a space or newline after colon

[![enter image description here][2]][2]

CodePudding user response:

To get the line after Examination(s): you can use

re.search(r'Examination\(s\):\s*([^\r\n] )', text)

See the regex demo. Details:

  • Examination\(s\): - a literal Examination(s): string
  • \s* - zero or more whitespaces
  • ([^\r\n] ) - Group 1: one or more chars other than CR and LF chars.

See the Python demo:

import re
texts = ["Examination(s):\r\nMathematics 2nd Paper\r\n\r\nTimeTable",
    "Examination(s):\r\nMathematics 2nd Paper\r\nblahblah",
    "Examination(s):\r\nMathematics 2nd Paper\r\n\r\n\r\nmarks"]
 
for text in texts:
    m = re.search(r'Examination\(s\):\s*([^\r\n] )', text)
    print(f'--- {repr(text)} ---')
    if m:
        print(m.group(1))

Output:

--- 'Examination(s):\r\nMathematics 2nd Paper\r\n\r\nTimeTable' ---
Mathematics 2nd Paper
--- 'Examination(s):\r\nMathematics 2nd Paper\r\nblahblah' ---
Mathematics 2nd Paper
--- 'Examination(s):\r\nMathematics 2nd Paper\r\n\r\n\r\nmarks' ---
Mathematics 2nd Paper
  • Related