I am using pandas and have run into a few occasions where I have a programmatically generated list of conditionals, like so
conditionals = [
df['someColumn'] == 'someValue',
df['someOtherCol'] == 'someOtherValue',
df['someThirdCol'].isin(['foo','bar','baz']),
]
and I want to select rows where ALL of these conditions are true. I figure I'd do something like this.
bigConditional = IHaveNoIdeaOfWhatToPutHere
for conditional in conditionals:
bigConditional = bigConditional && conditional
filteredDf = df[bigConditional]
I know that I WANT to use the identity property, to where bigConditional is initialized to a series of true for every index in my dataframe, such that if any condition in my conditionals list evaluates to false that row won't be in the filtered dataframe, but initially every row is considered.
I don't know how to do that, or at least not the best most succinct way that shows it's intentional
Also, I've run into inverse scenarios where I only need on of the conditionals to match to include the row into the new dataframe, so I would need bigConditional to be set to false for every index in the dataframe.
CodePudding user response:
Maybe you can use query
and generate your conditions like this:
conditionals = [
"someColumn == 'someValue'",
"someOtherCol == 'someOtherValue'",
"someThirdCol.isin(['foo', 'bar', 'baz'])",
]
qs = ' & '.join(conditionals)
out = df.query(qs)
Or use eval
to create boolean values instead of filter your dataframe:
mask = df.eval(qs)
Demo
Suppose this dataframe:
>>> df
someColumn someOtherCol someThirdCol
0 someValue someOtherValue foo
1 someValue someOtherValue baz
2 someValue anotherValue anotherValue
3 anotherValue anotherValue anotherValue
>>> df.query(qs)
someColumn someOtherCol someThirdCol
0 someValue someOtherValue foo
1 someValue someOtherValue baz
>>> df.eval(qs)
0 True
1 True
2 False
3 False
dtype: bool
You can even use f-strings or another template language to pass variables to your condition list.
CodePudding user response:
what about sum the conditions and check if it is equal to the number of conditions
filteredDf = df.loc[sum(conditionals)==len(conditionals)]
or even more simple, with np.all
filteredDf = df.loc[np.all(conditionals, axis=0)]
otherwise, for your original question, you can create a series of True indexed like df and your for
loop should work.
bigConditional = pd.Series(True, index=df.index)