Index out of bound when dropping rows in a dataframe-CodePudding

I can't understand why i get the error "IndexError: index 159 is out of bounds for axis 0 with size 159" while dropping a list of rows from a dataframe.

#Import file Excel
xls = pd.ExcelFile(file_path)
#Parse away the first 5 rows
df = xls.parse('Daten', skiprows=5, index_col=None, na_values=['NA'])
# Select row where value in column "Punktrolle_SO" is not 'UK_Schwelle_Wehr_Blockrampe'   
row_numbers = [x 1 for x in df[df['Punktrolle_SO'] != 'UK_Schwelle_Wehr_Blockrampe'].index]
#Changing the index to skip the index 0
df.index = df.index   1
#Dropping the rows where the data are not 'UK_Schwelle_Wehr_Blockrampe'
dataframe = df.drop(df.index[row_numbers], inplace=True)

The list row_numbers contains the correct 156 values and the dataframe index goes from 1 to 159 so why do I get an IndexError?

runfile('O:/GIS/GEP/Risikomanagement/Flussvermessung/ALD/Analyses/ReadMultileFilesInOne.py', wdir='O:/GIS/GEP/Risikomanagement/Flussvermessung/ALD/Analyses')
Traceback (most recent call last):

  File "O:\GIS\GEP\Risikomanagement\Flussvermessung\ALD\Analyses\ReadMultileFilesInOne.py", line 73, in <module>
    dataframe = df.drop(df.index[row_numbers], inplace=True)

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\range.py", line 708, in __getitem__
    return super().__getitem__(key)

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 3941, in __getitem__
    result = getitem(key)

IndexError: index 159 is out of bounds for axis 0 with size 159

Can anyone help me to see what I am doing worng?

Thank you,

Davide

I expect a dataframe containig the rows of the Excel file where the value in the column "Punktrolle_SO" is equal to 'UK_Schwelle_Wehr_Blockrampe'.

CodePudding user response：

Isn't it better to just keep the rows containing UK_Schwelle_Wehr_Blockrampe using?:

df[df["Punktrolle_SO"].str.contains("UK_Schwelle_Wehr_Blockrampe")]

CodePudding user response：

If the dataframe has a size of 159, then the highest index is 158. This is because the indicies start at 0 instead of 1. You are trying to access an index one higher than the maximum.

The dataframe does not go from 1 to 159 - it goes from 0 to 158. Thus index 159 will be out of bounds. You need to offset your accesses by 1.