I am new to python (like learned about it a few days ago new) and I am supposed to preprocess some pdfs in a folder. I am supposed to remove punctuations, make everything lower case and remove stopwords, and add some extra data from another csv to it (as metadata) the usual. But I cannoe even open them. All the googleing does not help, since I do not understand the errormessage (none of the examples from other people helped, since they had different data types).
Can someone help me please? This is my code so far:
import PyPDF2
import re
for k in range(1,312):
# open the pdf file
object = PyPDF2.PdfFileReader("/Users/n_n/Desktop/Digitalization/reserve" % (k))
and this is what happens
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [37], in <cell line: 4>()
2 import re
4 for k in range(1,312):
5 # open the pdf file
----> 6 object = PyPDF2.PdfFileReader("/Users/n_n/Desktop/Digitalization/reserve" % (k))
TypeError: not all arguments converted during string formatting
Please help, I dont have time for this.
CodePudding user response:
object = PyPDF2.PdfFileReader("/Users/n_n/Desktop/Digitalization/reserve%s" % str(k))