How to create a true-for-all index on a pandas dataframe?-CodePudding

I am using pandas and have run into a few occasions where I have a programmatically generated list of conditionals, like so

conditionals = [
    df['someColumn'] == 'someValue',
    df['someOtherCol'] == 'someOtherValue',
    df['someThirdCol'].isin(['foo','bar','baz']),
   ]

and I want to select rows where ALL of these conditions are true. I figure I'd do something like this.

bigConditional = IHaveNoIdeaOfWhatToPutHere

for conditional in conditionals:
   bigConditional = bigConditional && conditional
filteredDf = df[bigConditional]

I know that I WANT to use the identity property, to where bigConditional is initialized to a series of true for every index in my dataframe, such that if any condition in my conditionals list evaluates to false that row won't be in the filtered dataframe, but initially every row is considered.

I don't know how to do that, or at least not the best most succinct way that shows it's intentional

Also, I've run into inverse scenarios where I only need on of the conditionals to match to include the row into the new dataframe, so I would need bigConditional to be set to false for every index in the dataframe.

CodePudding user response：

Maybe you can use query and generate your conditions like this:

conditionals = [
    "someColumn == 'someValue'",
    "someOtherCol == 'someOtherValue'",
    "someThirdCol.isin(['foo', 'bar', 'baz'])",
]

qs = ' & '.join(conditionals)

out = df.query(qs)

Or use eval to create boolean values instead of filter your dataframe:

mask = df.eval(qs)

Demo

Suppose this dataframe:

>>> df
     someColumn    someOtherCol  someThirdCol
0     someValue  someOtherValue           foo
1     someValue  someOtherValue           baz
2     someValue    anotherValue  anotherValue
3  anotherValue    anotherValue  anotherValue

>>> df.query(qs)
  someColumn    someOtherCol someThirdCol
0  someValue  someOtherValue          foo
1  someValue  someOtherValue          baz

>>> df.eval(qs)
0     True
1     True
2    False
3    False
dtype: bool

You can even use f-strings or another template language to pass variables to your condition list.

CodePudding user response：

what about sum the conditions and check if it is equal to the number of conditions

filteredDf = df.loc[sum(conditionals)==len(conditionals)]

or even more simple, with np.all

filteredDf = df.loc[np.all(conditionals, axis=0)]

otherwise, for your original question, you can create a series of True indexed like df and your for loop should work.

bigConditional = pd.Series(True, index=df.index)