Batch convert PDFs to CSVs-CodePudding

What am I doing wrong? Here is the code that I attempted:

import glob
import tabula

for filepath in glob.iglob('C:/Users/username/Downloads/folder with space/myfolderwithpdfs/*.pdf'):
    tabula.convert_into(filepath, pages="all", output_format='csv')

Error:

TypeError                                 Traceback (most recent call last)
Input In [11], in <cell line: 6>()
      5 # transform the pdfs into excel files
      6 for filepath in glob.iglob(C:/Users/username/Downloads/folder with space/myfolderwithpdfs/*.pdf'):
----> 7     tabula.convert_into(filepath, pages="all", output_format='csv')

TypeError: convert_into() missing 1 required positional argument: 'output_path'

CodePudding user response：

This will read the pdf files in your Download folder then convert it into tabular using csv format.

import os
import glob
import tabula

path="/Users/username/Downloads/"
for filepath in glob.glob(path '*.pdf'):
    name=os.path.basename(filepath)
    tabula.convert_into(input_path=filepath, 
                        output_path=path name ".csv",
                        pages="all")

CodePudding user response：

it appears you have not defined the output_path location for your converted pdf

import glob import tabula

for filepath in glob.iglob('C:/Users/username/Downloads/folder with space/myfolderwithpdfs/*.pdf'): tabula.convert_into(filepath, pages="all", output_format='csv', output_path="C:/Users/username/Downloads/new Folder with CSvs")