Home > database >  Read_excel(), A column of string, but in excel format, it is date and cause error
Read_excel(), A column of string, but in excel format, it is date and cause error

Time:05-25

My data.xlsx is something like this

 Number  Grade
3217839     A
7893242     B
3782195     C
      0     D

However, in excel, the Number column text format is "Short Date", which cause the data to become

  Number  Grade
 #######     A
 #######     B
 #######     C
00-01-00     D

I used df = pd.read_csv("data.xlsx"), and it hit error and show NaN

the error (only showing A2):

UserWarning: Cell A2 is marked as a date but the serial value 321783921 is outside the limits for dates. 
The cell will be treated as an error. warn(msg)

     Number Grade
0       NaN     A
1       NaN     B
2       NaN     C
3  00:00:00     D

I have tried with df = pd.read_csv("data.xlsx", coverters={"Number": str}). The result still the same. Any way to solve this without manually changing the format in the excel?

CodePudding user response:

Excel represents dates with a "serial" number, starting from 1 for 01.01.1900 up to 2958465 for 31.12.9999. So both Pandas and Excel complain about your values because they cannot be interpreted as date. Just change the cell format,it is simply wrong.

CodePudding user response:

If you don't have the authority to modify and then save the wb, you could use xlwings to call Excel from within python. Open wb, change the number formatting of the cells to 'General', select your ranges, close wb and quit app.

# assuming data in wb.sheets[0].range('A1:B5')

import xlwings
import pandas as pd

my_file = 'data.xlsx'

excel_app = xlwings.App(visible=False)
excel_book = excel_app.books.open(my_file, read_only=True)

sht = excel_book.sheets[0]
sht.range('A1:A5').number_format = 'General'

my_cols = sht.range('A1:B1').value
my_range = sht.range('A2:B5').value

excel_book.close()
excel_app.quit()

df = pd.DataFrame(my_range, columns=my_cols)

# =============================================================================
# 
# display(df)
# 
#       Number Grade
# 0  3217839.0     A
# 1  7893242.0     B
# 2  3782195.0     C
# 3        0.0     D
# 
# =============================================================================
  • Related