Home > Enterprise >  Reading excel data starting from a specific row
Reading excel data starting from a specific row

Time:10-16

I am trying to read an excel data set starting from a specific row in a workbook containing multiple sheets. I can't use the row index as the location of the row I want to start with changes in each sheet.

My data looks like this:

A header Another header
start after this A
07:00 B
08:00 C
09:00 D

and I wanted to read the data starting from whatever row is before 08:00

A header Another header
07:00 B
08:00 C
09:00 D

I have tried the following but I am getting an error that it can't find "08:00" in the list

df = df[df.index[list(df.index.values).index('08:00') - 1:]]

I would appreciate any hints, many thanks in advance!

CodePudding user response:

Compare values of index converted to Series, so possible add Series.shift for previous match and Series.cummax for next Trues:

print (df)
                 Another header
A header                       
start after this              A
07:00                         B
08:00                         C
09:00                         D

print (df[(df.index.to_series() == '08:00').shift(-1,fill_value=False).cummax()])
         Another header
A header               
07:00                 B
08:00                 C
09:00                 D

print (df[(df.index.to_series() == '07:00').shift(-1,fill_value=False).cummax()])
                 Another header
A header                       
start after this              A
07:00                         B
08:00                         C
09:00                         D

print (df[(df.index.to_series() == '09:00').shift(-1,fill_value=False).cummax()])
         Another header
A header               
08:00                 C
09:00                 D

print (df[(df.index.to_series() == '17:00').shift(-1,fill_value=False).cummax()])
Empty DataFrame
Columns: [Another header]
Index: []

CodePudding user response:

I would write the iteration out explicitly to make things clear:

start = 0 # in case we fail to find it later
for i, row in df["Col 1 header"].iteritems():
    if "07:00" in row:
        start = i -1
        break

Then you can just use start from then on.

  • Related