I have Polars df from a csv and I try to filter it by value list:
list = [1, 2, 4, 6, 48]
df = (
pl.read_csv("bm.dat", sep=';', new_columns=["cid1", "cid2", "cid3"])
.lazy()
.filter((pl.col("cid1") in list) & (pl.col("cid2") in list))
.collect()
)
I receive an error:
ValueError: Since Expr are lazy, the truthiness of an Expr is ambiguous. Hint: use '&' or '|' to chain Expr together, not and/or.
But when I comment #.lazy()
and #.collect()
, I receive this error again.
I tried only one filter .filter(pl.col("cid1") in list
, and received the error again.
How to filter df by value list with Polars?
CodePudding user response:
Your error relates to using the in
operator. In Polars, you want to use the is_in
Expression.
For example:
df = pl.DataFrame(
{
"cid1": [1, 2, 3],
"cid2": [4, 5, 6],
"cid3": [7, 8, 9],
}
)
list = [1, 2, 4, 6, 48]
(
df.lazy()
.filter((pl.col("cid1").is_in(list)) & (pl.col("cid2").is_in(list)))
.collect()
)
shape: (1, 3)
┌──────┬──────┬──────┐
│ cid1 ┆ cid2 ┆ cid3 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞══════╪══════╪══════╡
│ 1 ┆ 4 ┆ 7 │
└──────┴──────┴──────┘
But if we attempt to use the in
operator instead, we get our error again.
(
df.lazy()
.filter((pl.col("cid1") in list) & (pl.col("cid2") in list))
.collect()
)
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
File "/home/corey/.virtualenvs/StackOverflow/lib/python3.10/site-packages/polars/internals/expr/expr.py", line 155, in __bool__
raise ValueError(
ValueError: Since Expr are lazy, the truthiness of an Expr is ambiguous. Hint: use '&' or '|' to chain Expr together, not and/or.