Home > Back-end >  ValueError: File is not a recognized excel file, How to read .xls filke?
ValueError: File is not a recognized excel file, How to read .xls filke?

Time:03-23

I am trying to read an excel file which is a url. The link and the code is below:


excel = 'https://www.marketnews.usda.gov/mnp/fv-report?&commAbr=AVOC&step3date=true&locAbr=HX&repType=termPriceDaily&refine=false&Run=Run&type=termPrice&repTypeChanger=termPriceDaily&environment=&_environment=1&locAbrPass=CHICAGO||HX&locChoose=commodity&commodityClass=allcommodity&locAbrlength=1&organic=&repDate=01/01/2022&endDate=03/17/2022&format=excel&rebuild=false'

data = pd.read_excel(excel, engine='openpyxl')


I tried using openpyxl and i get the following error:

File is not a zip file

I even tried using pd.read_csv but the data is coming in html format which isn't easily readable:


df = pd.read_csv('https://www.marketnews.usda.gov/mnp/fv-report?&commAbr=AVOC&step3date=true&locAbr=HX&repType=termPriceDaily&refine=false&Run=Run&type=termPrice&repTypeChanger=termPriceDaily&environment=&_environment=1&locAbrPass=CHICAGO||HX&locChoose=commodity&commodityClass=allcommodity&locAbrlength=1&organic=&repDate=01/01/2022&endDate=03/17/2022&format=excel&rebuild=false',
                 sep='</tr><tr>'
           )

Please help!

CodePudding user response:

The url returns an html page not an excel file.

Use:

df = pd.read_html(excel)[0]
print(df)

# Output
     Commodity Name City Name  ... Price Comment                                           Comments
0          AVOCADOS   CHICAGO  ...           NaN  Wide range in price, quality, condition and ap...
1          AVOCADOS   CHICAGO  ...           NaN  Wide range in price, quality, condition and ap...
2          AVOCADOS   CHICAGO  ...           NaN  Wide range in price, quality, condition and ap...
3          AVOCADOS   CHICAGO  ...           NaN  Wide range in price, quality, condition and ap...
4          AVOCADOS   CHICAGO  ...           NaN  Wide range in price, quality, condition and ap...
...             ...       ...  ...           ...                                                ...
1368       AVOCADOS   CHICAGO  ...           NaN  Wide range in price, quality, condition and ap...
1369       AVOCADOS   CHICAGO  ...           NaN  Wide range in price, quality, condition and ap...
1370       AVOCADOS   CHICAGO  ...           NaN  Wide range in price, quality, condition and ap...
1371       AVOCADOS   CHICAGO  ...           NaN  Wide range in price, quality, condition and ap...
1372       AVOCADOS   CHICAGO  ...           NaN  Wide range in price, quality, condition and ap...

[1373 rows x 29 columns]

CodePudding user response:

You won't be able to download/read the excel file using pandas as the url link is not of direct excel file.

Instead of pd.read_excel use below code:

import pandas as pd
df = pd.read_html(excel)[0]
print(df)
  • Related