I have a Pandas series and would like to remove multiple items using a list of keys to be deleted, but this list includes elements which is not a part of the Series keys.
I can do this using the commands below, but believe there must be a more elegant way to achieve this.
series1 = pd.Series({'a':8, 'b':7,'c':6, 'd':5})
list1 = ['b', 'c','e','f']
series1.drop(set(series1.keys()).intersection(set(list1)))
Result:
a 8
d 5
dtype: int64
Is there any idea?
CodePudding user response:
We can filter with Index.difference
(which performs the same set difference without all of the extraction/explicit conversion). In Pandas, inclusive masks tend to be faster and shorter than exclusive masks/dropping rows:
series1[series1.index.difference(list1)]
a 8
d 5
dtype: int64
CodePudding user response:
You can use Index.isin
to produce a boolean mask and then invert it
series1 = series1[~series1.index.isin(list1)]
Performance
I was curious, so here is a small speed test comparing mine to Henry's solution. Of course, this is a micro-optimisation which is only worth taking into account if you are dealing with tons of data or indexing many times.
Setup
series1 = pd.Series({'a':8, 'b':7,'c':6, 'd':5})
list1 = ['b', 'c','e','f']
n = 100_000
# repeat series1 n times
s = series1.repeat(n)
>>> s.shape
(400000,)
Results
>>> %timeit s[~s.index.isin(list1)]
15 ms ± 1.79 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit s[s.index.difference(list1)]
75.3 ms ± 8.01 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)