I'm trying to match to this string 2.1.34.5.1.
using this regex ^((\d{1,2}.)*)
by extracting a PDF File. However, I'm not getting to print this return. This is the output pdf page.
93
|
Page
1.5.4 Require Authentication for Single
-
What is happening here, what is matching is the 93
instead 1.5.4
.
import PyPDF2
import re
import sys
if __name__ == '__main__':
pdf_file = open('RH5-94.pdf','rb')
read_pdf = PyPDF2.PdfFileReader(pdf_file)
number_of_pages = read_pdf.getNumPages()
IdCis = "(\d{1,2}.{0,1})*"
Description = "(?<=Description:)(.*)(?=Rationale)"
Rationale = "(?<=Rationale:)(.*)(?=Audit)"
textPage = read_pdf.getPage(0).extractText()
print(re.search(IdCis,textpage).group(0))
CodePudding user response:
Your first example 2.1.34.5.1.
ends with a .
and your second 1.5.4
doesn't. For that reason, I am assuming that the sequence may or may not end with a .
Don't forget that to match a literal .
you escape it with a backslash.
To ensure that there is at least one .
in the sequence,
rather then *
is used to match the group at least once.
^(?:\d{1,2}\.) \d{0,2}
CodePudding user response:
Try this:
(\d{1,2}.{0,1})*
your problem is that your regex demands a dot at the end