Home > Blockchain >  read_excel from Pandas not reading all data (missing columns from first row)
read_excel from Pandas not reading all data (missing columns from first row)

Time:12-05

I have an extremely simple .xlsx file and pandas is not reading the first row completely. It's very strange since it only reads one of the columns, and the others are blank. After A LOT of trial and error, it seems there's something hidden in the Excel file itself, since if I remove completely the row, and I just type it all again, then it works.

However, there's nothing visual that I can see. If I export the file to .csv then pandas works as well.

I'm using python 3.7 with pandas 1.1.5. I tried upgrading pandas but I can't, pip tells me I'm using the latest available version, even though I see that pandas 1.3 is available. Not sure if this is already fixed in a new version, and if it is, how do I get it installed (I'm using the app both on Mac and on Windows via Anaconda).

The xlsx file showing the problem is here:

https://docs.google.com/spreadsheets/d/1Xze2DNCyIARG7vdGFh0aUGHnhfgkciV5/edit?usp=sharing&ouid=117900420544251849196&rtpof=true&sd=true

It just contains the header and a row. That's it.

The script to read it is this:

import pandas as pd

print(f"pandas version is {pd.__version__}")
df = pd.read_excel('Book1.xlsx', dtype=str)
df = df.fillna('')
print(f"columns are {df.columns.tolist()}")
print(df)

And the output is this:

anibal@~/PycharmProjects/CIUSSS$ python3 test.py
pandas version is 1.1.5
columns are ['Source']
                                                                         Source
SNOMED CT 115161005 Genus Abiotrophia (organism) Abiotrophia  Genus Abiotrophia

Where it should be:

anibal@~/PycharmProjects/CIUSSS$ python3 test.py
pandas version is 1.1.5
columns are ['Source', 'f2', 'f3', 'f4', 'f5']
      Source         f2                            f3           f4                 f5
0  SNOMED CT  115161005  Genus Abiotrophia (organism)  Abiotrophia  Genus Abiotrophia

Can somebody please tell me if there's something different that I should be doing in the API to be able to read this? Or if I need to have a newer version of pandas, how do I get a newer version with pip (and then in anaconda)?

CodePudding user response:

Update: the issue was indeed the version. I tried the exact same file with python 3.9.9 and pandas 1.3.4 and everything looks good.

  • Related