Home > Back-end >  Flask uploaded file encoding decoding for pypdf2
Flask uploaded file encoding decoding for pypdf2

Time:01-18

I am making an app using flask that requires me to take an uploaded file and read it. Since the file should be uploaded daily for my app, I do not want to upload it directly to the server but rather just read it when it gets uploaded. I have successfully gotten the file from the HTML form :

   if request.method == 'POST':
       file = request.files.get('file')

This returns a file storage. I then want to read the file. To do that, I tried using pypdf2 to read the file.

    if file:
        pdfFileObj = open(file, 'rb')
        pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
        print(pdfReader.numPages)
        pageObj = pdfReader.getPage(0)
        print(pageObj.extractText())
        pdfFileObj.close()

This results in an error:

TypeError: expected str, bytes or os.PathLike object, not FileStorage

I tried using the read() function, I also tried encoding and decoding in different formats. I converted it to bytes, and binary, out of which neither worked. I would appreciate a method through which I can efficiently convert the filestorage, to the same encoding pypdf2 uses. Any help would be greatly appreciated. Thank you.

CodePudding user response:

pdfFileObj = open(file, 'rb') Is where you are getting the error, you are giving the open file not a valid argount (FileStorage).

You can only pass to it only str, bytes or os.PathLike object as the error mentions.

what you can do is pass the PdfFileReader the file object and it will work.

from flask import Flask, request
import PyPDF2

app = Flask(__name__)

@app.route('/', methods=['POST'])
def file_upload():

    file = request.files.get('file')

    if not file:
        return 'No file uploaded.'

    pdfReader = PyPDF2.PdfReader(file)

    print(pdfReader.pages)
    pageObj = pdfReader.pages[0]
    print(pageObj.extract_text())

    return 'success'

Also a couple of notes:

  • You can pass to flask route decorator the methods that can be used to access the given route so you will not need to check the request method
  • in pypdf 3.0 most of the functions and class you are using got deprecated, if you are using pypdf 3.0 or above you will get errors with the correct function and class to use.

CodePudding user response:

Your error is trying to open the FileObject, it is already a file-like object.

You should be able to pass it directly to PyPDF2.PdfFileReader.

    if file:
        pdfReader = PyPDF2.PdfFileReader(file)
        print(pdfReader.numPages)
        pageObj = pdfReader.getPage(0)
        print(pageObj.extractText())
  • Related