Home > Software engineering >  Why is the data in the PDF written in the 1st column?
Why is the data in the PDF written in the 1st column?

Time:09-28

I have a pdf file called Question.pdf, and its content is as follows.

Question.pdf

I am converting my pdf file to an xlsx file using the python tabula module. However, it writes all the data in the 1st column of my excel file, how can I delete this field? (the part indicated in the red area)

data.xlsx

import tabula
df = tabula.read_pdf('Question.pdf', pages=1, lattice=True)[1]

df.columns = df.columns.str.replace('\r', ' ')
data = df.dropna()
data.to_excel('data.xlsx', index=False)

CodePudding user response:

Try this while exporting;

data.to_excel('data.xlsx', index=False, header=None)

Hope this Helps...

  • Related