I am trying to read an excel file which is a url. The link and the code is below:
excel = 'https://www.marketnews.usda.gov/mnp/fv-report?&commAbr=AVOC&step3date=true&locAbr=HX&repType=termPriceDaily&refine=false&Run=Run&type=termPrice&repTypeChanger=termPriceDaily&environment=&_environment=1&locAbrPass=CHICAGO||HX&locChoose=commodity&commodityClass=allcommodity&locAbrlength=1&organic=&repDate=01/01/2022&endDate=03/17/2022&format=excel&rebuild=false'
data = pd.read_excel(excel, engine='openpyxl')
I tried using openpyxl and i get the following error:
File is not a zip file
I even tried using pd.read_csv but the data is coming in html format which isn't easily readable:
df = pd.read_csv('https://www.marketnews.usda.gov/mnp/fv-report?&commAbr=AVOC&step3date=true&locAbr=HX&repType=termPriceDaily&refine=false&Run=Run&type=termPrice&repTypeChanger=termPriceDaily&environment=&_environment=1&locAbrPass=CHICAGO||HX&locChoose=commodity&commodityClass=allcommodity&locAbrlength=1&organic=&repDate=01/01/2022&endDate=03/17/2022&format=excel&rebuild=false',
sep='</tr><tr>'
)
Please help!
CodePudding user response:
The url returns an html page not an excel file.
Use:
df = pd.read_html(excel)[0]
print(df)
# Output
Commodity Name City Name ... Price Comment Comments
0 AVOCADOS CHICAGO ... NaN Wide range in price, quality, condition and ap...
1 AVOCADOS CHICAGO ... NaN Wide range in price, quality, condition and ap...
2 AVOCADOS CHICAGO ... NaN Wide range in price, quality, condition and ap...
3 AVOCADOS CHICAGO ... NaN Wide range in price, quality, condition and ap...
4 AVOCADOS CHICAGO ... NaN Wide range in price, quality, condition and ap...
... ... ... ... ... ...
1368 AVOCADOS CHICAGO ... NaN Wide range in price, quality, condition and ap...
1369 AVOCADOS CHICAGO ... NaN Wide range in price, quality, condition and ap...
1370 AVOCADOS CHICAGO ... NaN Wide range in price, quality, condition and ap...
1371 AVOCADOS CHICAGO ... NaN Wide range in price, quality, condition and ap...
1372 AVOCADOS CHICAGO ... NaN Wide range in price, quality, condition and ap...
[1373 rows x 29 columns]
CodePudding user response:
You won't be able to download/read the excel file using pandas as the url link is not of direct excel file.
Instead of pd.read_excel
use below code:
import pandas as pd
df = pd.read_html(excel)[0]
print(df)