Home > Software design >  Drop pandas rows by index name
Drop pandas rows by index name

Time:02-12

I have a dataframe of let say 1000 rows. I want to drop the row store in variable breakval and also next 10 rows.

breakval=[10,100,500]
for i in breakval:
    df=df.drop(df.index[i:i 10])

But after first iteration in which row 10-20 are dropped. I want row 100-110 by index name, not by row number. Because row number is reduced by 10, but row index name remain the same. I am looking for a way to get index by its name. For example if i do df.index[12] it will return me the value, though i dropped it. For this purpose i used df.loc[22].name, but it worked for a single value

CodePudding user response:

df['row_index'] = np.arange(df.shape[0])
breakval = [10,100,500]
consequtive_entries_to_remove = 10
for i in breakval:
    for j in range(0, consequtive_entries_to_remove):
        df.drop(df.index[df['row_index'] == i j], inplace=True)

# Remove column name 'row_index'
df.drop(['row_index'], axis = 1, inplace=True)

CodePudding user response:

One solution can be sorting breakval decreasingly. So breakval = [500, 100, 10] and doing the same dropping but 500 first, then 100 and then 10.

It will also obviously drop the rows and the indices will be updated. But at first, it will drop from index 500 to 510 so the indices of the laters will be updated, the previous indices are not changed.

Then dropping from 100 to 110 will also change the later indices not the previous indices before 100 and same for 10 to 20.

CodePudding user response:

You should:

  • first collect all index values, for rows to be dropped,
  • then remove rows with these indices.

For generality, let's define another variable, stating how many rows should be dropped, starting from each of your locations:

nToDrop = 10

or maybe 11 if you want to drop the indicated row (e.g. 5) and 10 rows following it.

To collect the index values, you can run e.g.:

idxToDrop = [item
    for sublist in [df.index[i:i nToDrop] for i in breakval]
        for item in sublist]

Note that [df.index[i:i nToDrop] for i in breakval] is a list comprehension generating a list of RangeIndex objects. Conceptually it can be taken as a list of lists.

The whole above expression can also be presented as:

[item
    for sublist in listOfLists
        for item in sublist]

It flattens this list (of lists), getting a plain list of index values for rows to be dropped.

Finally, to get the expected result, run:

df.drop(idxToDrop, inplace=True)

Edit

Another, maybe faster solution, is to concatenate RangeIndex objects into a pandasonic Series, but as pd.concat operates e.g. on Series objects, you must also convert each source RangeIndex object into a Series.

So you can generate idxToDrop also as:

idxToDrop = pd.concat([df.index[i:i nToDrop].to_series() for i in breakval])

and then use it in df.drop as above.

  • Related