I have a dataframe of let say 1000 rows. I want to drop the row store in variable breakval
and also next 10 rows.
breakval=[10,100,500]
for i in breakval:
df=df.drop(df.index[i:i 10])
But after first iteration in which row 10-20 are dropped. I want row 100-110 by index name, not by row number. Because row number is reduced by 10, but row index name remain the same.
I am looking for a way to get index by its name. For example if i do df.index[12]
it will return me the value, though i dropped it. For this purpose i used df.loc[22].name
, but it worked for a single value
CodePudding user response:
df['row_index'] = np.arange(df.shape[0])
breakval = [10,100,500]
consequtive_entries_to_remove = 10
for i in breakval:
for j in range(0, consequtive_entries_to_remove):
df.drop(df.index[df['row_index'] == i j], inplace=True)
# Remove column name 'row_index'
df.drop(['row_index'], axis = 1, inplace=True)
CodePudding user response:
One solution can be sorting breakval
decreasingly. So breakval = [500, 100, 10]
and doing the same dropping but 500 first, then 100 and then 10.
It will also obviously drop the rows and the indices will be updated. But at first, it will drop from index 500 to 510
so the indices of the laters will be updated, the previous indices are not changed.
Then dropping from 100 to 110
will also change the later indices not the previous indices before 100
and same for 10 to 20
.
CodePudding user response:
You should:
- first collect all index values, for rows to be dropped,
- then remove rows with these indices.
For generality, let's define another variable, stating how many rows should be dropped, starting from each of your locations:
nToDrop = 10
or maybe 11 if you want to drop the indicated row (e.g. 5) and 10 rows following it.
To collect the index values, you can run e.g.:
idxToDrop = [item
for sublist in [df.index[i:i nToDrop] for i in breakval]
for item in sublist]
Note that [df.index[i:i nToDrop] for i in breakval]
is a list comprehension
generating a list of RangeIndex objects. Conceptually it can be taken as a list of lists.
The whole above expression can also be presented as:
[item
for sublist in listOfLists
for item in sublist]
It flattens this list (of lists), getting a plain list of index values for rows to be dropped.
Finally, to get the expected result, run:
df.drop(idxToDrop, inplace=True)
Edit
Another, maybe faster solution, is to concatenate RangeIndex objects into a pandasonic Series, but as pd.concat operates e.g. on Series objects, you must also convert each source RangeIndex object into a Series.
So you can generate idxToDrop also as:
idxToDrop = pd.concat([df.index[i:i nToDrop].to_series() for i in breakval])
and then use it in df.drop as above.