How to convert a particular sheet in excel file to pdf using python-CodePudding

There is a list of excel files in a directory. The input is a list of sheet names that has to be converted to pdf. So my code has to open the excel file, look for that particular excel sheet and convert that one sheet to pdf. Can anybody suggest which library to use and approach for this. How can I use a variable that has a list of all the required sheet names from all excel files, as argument to open the required excel sheets. Thank you.

INPUT: file1.xls file2.xls file3.xls

sheets in file1: Title, Contents, Summary

sheets in file2: Title, Contents, Summary

sheets in file3: Title, Contents, Summary

Required sheet in file1: Title

Required sheet in file2: Contents

Required sheet in file3: Summary

OUTPUT:

file1_Title.pdf

file2_Contents.pdf

file3_Summary.pdf

Approach: I have a python list with all the sheets in each excel file. And a python list which contains the required sheet to be converted.

import xlrd
book = xlrd.open_workbook(PathforInputFile)
AllSheets = book.sheet_names()
RequiredSheet= line.split("\t")

Code Output:

['Title', 'Contents', 'Summary']

['Title']

['Title', 'Contents', 'Summary']

['Contents']

['Title', 'Contents', 'Summary']

['Summary']

CodePudding user response：

Openpyxl and aspose-cells seem to be the most relevant or, at least the best general excel options available that I could find.

This is an article I found. https://blog.aspose.com/2021/04/02/convert-excel-files-to-pdf-in-python/

But, I would also recommend going to the documentation of the two libraries I suggested. I think they could get you on the right track.

CodePudding user response：

For going through a directory of files, use glob:

dir = (root directory path without files)
for f_csv in glob2.iglob(os.path.join(dir, '*.csv')): # '*.csv' can be changed to the file extension of choice like .xlsx, etc.
    # run your ops here per file

Then you can add the base framework so that you're saving coding lines of doing this multiple times to the same exact type of file. I used openpyxl and pandas, but once you get the worksheet open and use index(0) in xlrd you would pick up right where I left off:

    dir = (root directory path without files)
    for f_csv in glob2.iglob(os.path.join(dir, '*.csv')):
        wb = load_workbook(f_csv)

        # Access to a worksheet named 'no_header'
        ws = wb['no_header']
 
        # Convert to DataFrame
        df = pd.DataFrame(ws.values)

Now the last part can be done differently, but I like to convert the sheet into pandas, then use df.to_html() to get it onto a website for download.

df.to_html(buf=None, columns=None, col_space=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, justify=None, max_rows=None, max_cols=None, show_dimensions=False, decimal='.', bold_rows=True, classes=None, escape=True, notebook=False, border=None, table_id=None, render_links=False, encoding=None)

I would read the docs on Pandas.dataframe.to_html() if the args don't make sense or you want to customize the method.

CodePudding user response：

You can use aspose-cells. Please visit this url. https://blog.aspose.com/2021/04/02/convert-excel-files-to-pdf-in-python/