Given a selector Series
, that is a Series
treated as a query of a DataFrame
in that the selector Series
's index is a subset of the rows of a DataFrame
to be queried, and whose values are to be matched in the query, currently, I'm doing the following:
import pandas as pd
df = pd.DataFrame([{'a':1,'b':2,'c':3},{'a':11,'b':22,'c':33}, {'a':1,'b':22,'c':333}])
selector_series = pd.Series({'a':1,'b':22})
desired_df = df[pd.DataFrame([df[x]==selector_series[x] for x in selector_series.index.intersection(df.columns)]).all()]
print(desired_df)
Output:
a b c
2 1 22 333
However, I suspect this kind of thing is a common enough operation that there is a more "Pandasic" (as opposed to Pythonic) way of doing the same thing. Is there?
CodePudding user response:
A more natural approach for pandas would be not to use loops but rather the eq
method for comparison on the common indices:
idx = df.columns.intersection(selector_series.index)
df.loc[df[idx].eq(selector_series.loc[idx]).all(1)]
CodePudding user response:
Pandas comparison methods like eq
perform an alignment operation. This is typically used to align for "outer" comparison:
Take this modified example
df = pd.DataFrame({'a': [1, 11, 1], 'b': [2, 22, 22], 'c': [3, 33, 333]})
selector_series = pd.Series({'a': 1, 'b': 22, 'd': 156})
df.eq(selector_series)
a b c d
0 True False False False
1 False True False False
2 True True False False
Notice how the union of columns between the Series and the DataFrame are compared. In this case, however, we do not want this since c
(only being in the DataFrame) and d
(only being in the Series) can only ever be False
.
We can instead, explicitly align
to override the default how
with 'inner'
.
Here is a helper method:
def intersection_eq(df_: pd.DataFrame, s_: pd.Series) -> pd.DataFrame:
aligned_df_, aligned_s_ = df_.align(s_, join='inner', axis=1, copy=False)
return aligned_df_.eq(aligned_s_)
Now this operation returns only the common columns in both or the intersection.
intersection_eq(df, selector_series)
a b
0 True False
1 False True
2 True True
Now we can filter as typical with all
on axis=1
:
df[intersection_eq(df, selector_series).all(axis=1)]
a b c
2 1 22 333
Naturally, we could also add this to our DataFrame definition, or create a SubClass of DataFrame to allow for more natural syntax:
def intersection_eq(df_: pd.DataFrame, s_: pd.Series) -> pd.DataFrame:
aligned_df_, aligned_s_ = df_.align(s_, join='inner', axis=1, copy=False)
return aligned_df_.eq(aligned_s_)
pd.DataFrame.intersection_eq = intersection_eq
Which allows for usage like:
df[df.intersection_eq(selector_series).all(axis=1)]
a b c
2 1 22 333