Home > Software engineering >  How to match number and dot sequence?
How to match number and dot sequence?

Time:10-14

I'm trying to match to this string 2.1.34.5.1. using this regex ^((\d{1,2}.)*) by extracting a PDF File. However, I'm not getting to print this return. This is the output pdf page.

93

|
Page


1.5.4 Require Authentication for Single
-

What is happening here, what is matching is the 93 instead 1.5.4.

import PyPDF2
import re
import sys
       
if __name__ == '__main__':
    pdf_file = open('RH5-94.pdf','rb')
    read_pdf = PyPDF2.PdfFileReader(pdf_file)
    number_of_pages = read_pdf.getNumPages()
    IdCis = "(\d{1,2}.{0,1})*"
    Description = "(?<=Description:)(.*)(?=Rationale)"
    Rationale = "(?<=Rationale:)(.*)(?=Audit)"    
    textPage = read_pdf.getPage(0).extractText() 
    print(re.search(IdCis,textpage).group(0))

CodePudding user response:

Your first example 2.1.34.5.1. ends with a . and your second 1.5.4 doesn't. For that reason, I am assuming that the sequence may or may not end with a .

Don't forget that to match a literal . you escape it with a backslash.

To ensure that there is at least one . in the sequence, rather then * is used to match the group at least once.

^(?:\d{1,2}\.) \d{0,2}

CodePudding user response:

Try this:

(\d{1,2}.{0,1})*

your problem is that your regex demands a dot at the end

  • Related