I am trying to read an excel data set starting from a specific row in a workbook containing multiple sheets. I can't use the row index as the location of the row I want to start with changes in each sheet.
My data looks like this:
A header | Another header |
---|---|
start after this | A |
07:00 | B |
08:00 | C |
09:00 | D |
and I wanted to read the data starting from whatever row is before 08:00
A header | Another header |
---|---|
07:00 | B |
08:00 | C |
09:00 | D |
I have tried the following but I am getting an error that it can't find "08:00" in the list
df = df[df.index[list(df.index.values).index('08:00') - 1:]]
I would appreciate any hints, many thanks in advance!
CodePudding user response:
Compare values of index
converted to Series
, so possible add Series.shift
for previous match and Series.cummax
for next True
s:
print (df)
Another header
A header
start after this A
07:00 B
08:00 C
09:00 D
print (df[(df.index.to_series() == '08:00').shift(-1,fill_value=False).cummax()])
Another header
A header
07:00 B
08:00 C
09:00 D
print (df[(df.index.to_series() == '07:00').shift(-1,fill_value=False).cummax()])
Another header
A header
start after this A
07:00 B
08:00 C
09:00 D
print (df[(df.index.to_series() == '09:00').shift(-1,fill_value=False).cummax()])
Another header
A header
08:00 C
09:00 D
print (df[(df.index.to_series() == '17:00').shift(-1,fill_value=False).cummax()])
Empty DataFrame
Columns: [Another header]
Index: []
CodePudding user response:
I would write the iteration out explicitly to make things clear:
start = 0 # in case we fail to find it later
for i, row in df["Col 1 header"].iteritems():
if "07:00" in row:
start = i -1
break
Then you can just use start
from then on.