I am using function to count occurrences of given word in pdf using PyPDF2. While the function is running I get message in terminal:
FloatObject (b'0.000000000000-14210855') invalid; use 0.0 instead
My code:
def count_words(word):
print()
print('Counting words..')
files = os.listdir('./pdfs')
counted_words = []
for idx, file in enumerate(files, 1):
with open(f'./pdfs/{file}', 'rb') as pdf_file:
ReadPDF = PyPDF2.PdfFileReader(pdf_file, strict=False)
pages = ReadPDF.numPages
words_count = 0
for page in range(pages):
pageObj = ReadPDF.getPage(page)
data = pageObj.extract_text()
words_count = sum(1 for match in re.findall(rf'\b{word}\b', data, flags=re.I))
counted_words.append(words_count)
print(f'File: {idx}')
return counted_words
How to get rid of this message?
CodePudding user response:
See https://pypdf2.readthedocs.io/en/latest/user/suppress-warnings.html
import logging
logger = logging.getLogger("PyPDF2")
logger.setLevel(logging.ERROR)