Home > front end >  Remove multiple keys from a Pandas series
Remove multiple keys from a Pandas series

Time:11-10

I have a Pandas series and would like to remove multiple items using a list of keys to be deleted, but this list includes elements which is not a part of the Series keys.

I can do this using the commands below, but believe there must be a more elegant way to achieve this.

series1 = pd.Series({'a':8, 'b':7,'c':6, 'd':5})
list1 = ['b', 'c','e','f']

series1.drop(set(series1.keys()).intersection(set(list1)))

Result:

a    8
d    5
dtype: int64

Is there any idea?

CodePudding user response:

We can filter with Index.difference (which performs the same set difference without all of the extraction/explicit conversion). In Pandas, inclusive masks tend to be faster and shorter than exclusive masks/dropping rows:

series1[series1.index.difference(list1)]
a    8
d    5
dtype: int64

CodePudding user response:

You can use Index.isin to produce a boolean mask and then invert it

series1 = series1[~series1.index.isin(list1)]

Performance

I was curious, so here is a small speed test comparing mine to Henry's solution. Of course, this is a micro-optimisation which is only worth taking into account if you are dealing with tons of data or indexing many times.

Setup

series1 = pd.Series({'a':8, 'b':7,'c':6, 'd':5})
list1 = ['b', 'c','e','f']

n = 100_000
# repeat series1 n times 
s = series1.repeat(n)

>>> s.shape
(400000,)

Results

>>> %timeit s[~s.index.isin(list1)]

15 ms ± 1.79 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

>>> %timeit s[s.index.difference(list1)]

75.3 ms ± 8.01 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
  • Related