Home > Blockchain >  opening and preprocessing text (300 pdfs) in python
opening and preprocessing text (300 pdfs) in python

Time:06-28

I am new to python (like learned about it a few days ago new) and I am supposed to preprocess some pdfs in a folder. I am supposed to remove punctuations, make everything lower case and remove stopwords, and add some extra data from another csv to it (as metadata) the usual. But I cannoe even open them. All the googleing does not help, since I do not understand the errormessage (none of the examples from other people helped, since they had different data types).

Can someone help me please? This is my code so far:

import PyPDF2
import re

for k in range(1,312):
    # open the pdf file
    object = PyPDF2.PdfFileReader("/Users/n_n/Desktop/Digitalization/reserve" % (k))
    

and this is what happens


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [37], in <cell line: 4>()
      2 import re
      4 for k in range(1,312):
      5     # open the pdf file
----> 6     object = PyPDF2.PdfFileReader("/Users/n_n/Desktop/Digitalization/reserve" % (k))

TypeError: not all arguments converted during string formatting

Please help, I dont have time for this.

CodePudding user response:

object = PyPDF2.PdfFileReader("/Users/n_n/Desktop/Digitalization/reserve%s" % str(k))

  • Related