I have a table in a pdf document. there is x, y, and z columns. I want to take x column only. is it possible using python? if so, how?
Then i want to plot x versus y. how do i do that (using data from table)
CodePudding user response:
Step 1:Using tabula-py
The methods used in the example are :
read_pdf(): reads the data from the tables of the PDF file of the given address
tabulate(): arranges the data in a table format
Code
from tabula import read_pdf
from tabulate import tabulate
#reads table from pdf file
df = read_pdf("filename.pdf",pages="all") #address of pdf file
print(tabulate(df))
Now you get output
CodePudding user response:
You can try to implement it with tabula. It has python wrapper that can read tables from PDF and convert them to pandas DataFrame.
Tabula: https://tabula.technology/
Python wrapper: https://pypi.org/project/tabula-py/
CodePudding user response:
Check out a PDF parsing library tabula-py, it extracts tables from PDF documents.
Then parse your table
import tabula
table = tabula.read_pdf(<file path> ,pages=<number of pages>)
replace and with respective values.
It should return a list of pandas dataframe, which you can further use to extract the column that you need.