I have an excel document that has three rows ahead of the main header(name of columns). Excel document
When loading the data in pandas data frame using :
import pandas
df = pandas.read_excel('output/tracker.xlsx')
print(df)
I get this data(which is fine):
Date/Time:13/06/2022 Unnamed: 1 Unnamed: 2 Unnamed: 3
0 NaN NaN NaN NaN
1 NaN 2763 2763 NaN
2 NaN Site ID Company Site ID Region
3 203990318_700670803 203990318 689179 Nord-Ost
I do not need the first three rows so I run :
df = df.iloc[2:]
It removes the rows that have ID of 0 and 1.
But it doesn't remove the Date/Time:13/06/2022 Unnamed: 1 etc row.
How do I remove that top row?
CodePudding user response:
Rather directly load the data without the useless rows using the skiprows
parameter of pandas.read_excel
:
df = pandas.read_excel('output/tracker.xlsx', skiprows=3)
CodePudding user response:
I get this data(which is fine):
pandas.read_excel
by default assumes 1st row is header, i.e. it does hold names for columns, which looking into snippet of your data is not case, use header=None
to inform pandas
that there are not names of column, but rather data, that is
import pandas
df = pandas.read_excel('output/tracker.xlsx',header=None)
print(df)
then you should be able to remove these as you already did