Home > Software design >  Python pandas drop issue on a date header
Python pandas drop issue on a date header

Time:06-28

I have an excel document that has three rows ahead of the main header(name of columns). Excel document

When loading the data in pandas data frame using :

import pandas
df = pandas.read_excel('output/tracker.xlsx')
print(df)

I get this data(which is fine):


 Date/Time:13/06/2022 Unnamed: 1   Unnamed: 2 Unnamed: 3 
0                     NaN        NaN          NaN        NaN 
1                     NaN       2763         2763        NaN 
2                     NaN    Site ID  Company Site ID     Region 
3     203990318_700670803  203990318       689179   Nord-Ost

I do not need the first three rows so I run :

df = df.iloc[2:]

It removes the rows that have ID of 0 and 1.

But it doesn't remove the Date/Time:13/06/2022 Unnamed: 1 etc row.

How do I remove that top row?

CodePudding user response:

Rather directly load the data without the useless rows using the skiprows parameter of pandas.read_excel:

df = pandas.read_excel('output/tracker.xlsx', skiprows=3)

CodePudding user response:

I get this data(which is fine):

pandas.read_excel by default assumes 1st row is header, i.e. it does hold names for columns, which looking into snippet of your data is not case, use header=None to inform pandas that there are not names of column, but rather data, that is

import pandas
df = pandas.read_excel('output/tracker.xlsx',header=None)
print(df)

then you should be able to remove these as you already did

  • Related