Home > database >  Why does this pandas DataFrame.query expression work with literals but not a variable substitution?
Why does this pandas DataFrame.query expression work with literals but not a variable substitution?

Time:03-26

I've run into a situation where I can query a pandas dataframe with column names and string literals in the query string

import pandas as pd

data = pd.DataFrame([list('aba'),
                     list('xyz')],
                    columns=['A', 'B', 'C'])

data.query("A == 'a'")
#    A  B  C
# 0  a  b  a

data.query("A.eq('a')")
#    A  B  C
# 0  a  b  a

but when I use a variable substitution instead of the column name, I get errors.

x = 'A'

data.query("@x == 'a'")
# KeyError: 'False: boolean label can not be used without a boolean index'

data.query("@x.eq('a')")
# AttributeError: 'str' object has no attribute 'eq'

What's going on?

CodePudding user response:

Variables referenced by @ in pandas.eval are supposed to reference variables in the local namespace- not columns within the DataFrame. So it interprets variables prefixed by @ as non-literals. (e.g. "@x.eq('a')" is expanded to "'A'.eq('a')", whereas you want "A.eq('a')")

https://pandas.pydata.org/docs/user_guide/enhancingperf.html#local-variables

If you want to substitute column names in and have pandas parse them as such, you can use the string formatting method, or f-string.

import pandas as pd

data = pd.DataFrame([list("aba"), list("xyz")], columns=["A", "B", "C"])

x = "A"

data.query(f'{x} == "a"') 
   A  B  C
0  a  b  a

This way the string is interpolated before pandas gets a hold of it and it can process those variables referring to a column.

  • Related