I am trying to extract PDF page numbers if the page contains certain strings, and then append the selected page numbers to a list. For example, page 2, 254, 439 and 458 meet the criteria and I'm expecting the output as a list [2,254,439,458]. My code is:
object=PyPDF2.PdfFileReader(file_path)
NumPages = object.getNumPages()
String = 'specific string'
for i in range(0,NumPages):
PageObj=object.getPage(i)
Text = PageObj.extractText()
ReSearch = re.search(String,Text)
Pagelist=[]
if ReSearch != None:
Pagelist.append(i)
print(Pagelist)
I received output as:
- [2]
- [254]
- [439]
- [458]
Could someone please take a look and see how I can fix it? Thank you
CodePudding user response:
Right now you are defining a new llst in every iteration, so you have to define the list only once, before the loop. Also print it outside the loop:
Pagelist=[]
for i in range(0,NumPages):
# rest of the loop
print(Pagelist)