How could I adjust this code to have the function loop through the list models_2
? If I have the function use models
it works, if I change to `models_2' it give me this error:
AttributeError: 'float' object has no attribute 'seek'
This is my dataframe, from an excel with all cell format set to "text".
MOD1 MOD2 MOD3 MOD4
0 File1.pdf File3.pdf File1.pdf File3.pdf
1 File2.pdf NaN File2.pdf File3.pdf
2 File3.pdf NaN NaN NaN
models = ['MOD1']
models_2 = ['MOD1', 'MOD2']
def merge_pdf(models):
merger = PdfFileMerger()
for name in models:
for index, row in df.iterrows():
merger.append(row[name])
merger.write(f"Order #XXXXXXX ({name}) Production Package - Rev.0.pdf")
merger.close()
merge_pdf(models)
The full error message:
PdfReadWarning: Xref table not zero-indexed. ID numbers for objects will be corrected. [_reader.py:1065]
Traceback (most recent call last):
File "Z:\PyCharm\Excel_Reader\Excel_Reader.py", line 30, in <module>
merge_pdf(models)
File "Z:\PyCharm\Excel_Reader\Excel_Reader.py", line 27, in merge_pdf
merger.append(row[name])
File "C:\Users\x\AppData\Local\Programs\Python\Python39\lib\site-packages\PyPDF2\merger.py", line 227, in append
self.merge(len(self.pages), fileobj, bookmark, pages, import_bookmarks)
File "C:\Users\x\AppData\Local\Programs\Python\Python39\lib\site-packages\PyPDF2\merger.py", line 149, in merge
pdfr = PdfFileReader(
File "C:\Users\x\AppData\Local\Programs\Python\Python39\lib\site-packages\PyPDF2\_reader.py", line 239, in __init__
self.read(stream)
File "C:\Users\x\AppData\Local\Programs\Python\Python39\lib\site-packages\PyPDF2\_reader.py", line 911, in read
stream.seek(-1, 2)
AttributeError: 'float' object has no attribute 'seek'
CodePudding user response:
Your code is failing because the column 'MOD2' contains NaN
values, which are of type float
. The way you handle this depends on what you want to do with those NaN
values.
You can verify that by running the following code:
import pandas as pd
import numpy as np
data = {
'MOD1':['File1.pdf', 'File2.pdf', 'File3.pdf'],
'MOD2':['File1.pdf', np.nan, np.nan],
'MOD3':['File1.pdf', 'File2.pdf', np.nan],
'MOD4':['File1.pdf', 'File2.pdf', np.nan]
}
df = pd.DataFrame(data)
models = ['MOD1']
models_2 = ['MOD1', 'MOD2']
merger = []
for name in models_2:
for index, row in df.iterrows():
print(name, index, row[name], type(row[name]))
This will print the following:
MOD1 0 File1.pdf <class 'str'>
MOD1 1 File2.pdf <class 'str'>
MOD1 2 File3.pdf <class 'str'>
MOD2 0 File1.pdf <class 'str'>
MOD2 1 nan <class 'float'>
MOD2 2 nan <class 'float'>
If you know you only want to include the cells with string values, you can add a type check prior to appending it to your merger
object, like so:
models = ['MOD1']
models_2 = ['MOD1', 'MOD2']
def merge_pdf(models):
merger = PdfFileMerger()
for name in models:
for index, row in df.iterrows():
if type(row[name]) == str:
merger.append(row[name])
merger.write(f"Order #XXXXXXX ({name}) Production Package - Rev.0.pdf")
merger.close()
merge_pdf(models)