I am making a python PyQt5 CSV comparison tool project and the user can add conditions for querying the pandas dataframe one by one before they are executed.
At the moment I have a nested list of conditions with each element containing the field, operation (==,!=,>,<), and value for comparison as strings. With just one condition I can use .query as it takes a string condition:
data.query('{} {} {}'.format(field,operation,value))
But as far as I can tell the formatting for multiple queries would use loc similar to this:
data.loc[(data.query('{} {} {}'.format(field[0],operation[0],value[0]))) & (data.query('{} {} {}'.format(field[1],operation[1],value[1]))) & ...]
Firstly I wanted to make sure my understanding of the loc function was correct (do I need a primary key at the end maybe?).
And secondly, how would I represent this multiple condition query with an unknown number of conditions set?
Thanks
CodePudding user response:
Would this work?
conds = [
f'{f} {o} {v}' for f, o, v in zip(field, operation, value)
]
data.query(' and '.join(conds))
CodePudding user response:
Warning: Not tested, more like a comment but put here for proper format:
data.query
returns a dataframe, you can't just do dataframe1 & dataframe2
. You would do something like
data.query(' AND '.join(['{} {} {}'.format(f, o, v)
for f, o, v in zip(fields, operations, values)
])
)