Home > Software engineering >  How is the equality operator in Pandas treated specially by Python?
How is the equality operator in Pandas treated specially by Python?

Time:08-17

df [df ['Some Column'] ==5]

If we take a look at the conditional in this statement, we have an equality comparator between df ['Some Column'] and 5. Typically, this should return False. But with Pandas, this returns a Series of the dataframe. How is this magic done behind the scenes? As you would expect a Series Object is not equal to 5, returning False. How does Pandas or Python ensure that False is not returned?

CodePudding user response:

Via inheritence, pandas.Series.__eq__ is:

class OpsMixin:
    ...
    @unpack_zerodim_and_defer("__eq__")
    def __eq__(self, other):
        return self._cmp_method(other, operator.eq)
class Series(base.IndexOpsMixin, generic.NDFrame):
    ...
    def _cmp_method(self, other, op):
        res_name = ops.get_op_result_name(self, other)

        if isinstance(other, Series) and not self._indexed_same(other):
            raise ValueError("Can only compare identically-labeled Series objects")

        lvalues = self._values
        rvalues = extract_array(other, extract_numpy=True, extract_range=True)

        with np.errstate(all="ignore"):
            res_values = ops.comparison_op(lvalues, rvalues, op)

        return self._construct_result(res_values, name=res_name)

So when you do:

pd.Series([1,2,3,4]) == 2

What is really returned is the result of:

import operator

import numpy as np

operator.eq(np.array([1,2,3,4]), 2)

So maybe your question should really be how do numpy arrays check for equality?

  • Related