Say I have the following Excel file:
A | B | C |
---|---|---|
1 | 11 | 111 |
2 | 22 | start |
3 | 33 | 333 |
4 | 44 | 444 |
I want to read the file in a dataframe making sure that I start to read it above the row where the Start value is. I tried the following:
df=df[(df.iloc[:, 2] == "start").shift(-1,fill_value=False).cummax()]
However, not only the row where start is located changes but also the col between the sheet in the workbook. Is there a way where I can look up the value I need to start with by using the col and row location?
that is what I want to have:
C |
---|
111 |
start |
333 |
444 |
Thanks!
CodePudding user response:
Try:
#index of first row above "start"
row_index = df[df.shift(-1).eq("start").any(axis=1)].index.min()
#name of first column before the column that contains "start"
col_index = df.loc[:,df.shift(-1, axis=1).eq("start").any(0)].columns[0]
#select all rows and columns per the above indices.
>>> df.loc[row_index:, col_index:]
B C
0 11 111
1 22 start
2 33 333
3 44 444