Home > OS >  How to filter df by value list with Polars?
How to filter df by value list with Polars?

Time:12-20

I have Polars df from a csv and I try to filter it by value list:

list = [1, 2, 4, 6, 48]

df = (
    pl.read_csv("bm.dat", sep=';', new_columns=["cid1", "cid2", "cid3"])
    .lazy()
    .filter((pl.col("cid1") in list) & (pl.col("cid2") in list))
    .collect()
)

I receive an error:

ValueError: Since Expr are lazy, the truthiness of an Expr is ambiguous. Hint: use '&' or '|' to chain Expr together, not and/or.

But when I comment #.lazy() and #.collect(), I receive this error again.

I tried only one filter .filter(pl.col("cid1") in list, and received the error again.

How to filter df by value list with Polars?

CodePudding user response:

Your error relates to using the in operator. In Polars, you want to use the is_in Expression.

For example:

df = pl.DataFrame(
    {
        "cid1": [1, 2, 3],
        "cid2": [4, 5, 6],
        "cid3": [7, 8, 9],
    }
)


list = [1, 2, 4, 6, 48]
(
    df.lazy()
    .filter((pl.col("cid1").is_in(list)) & (pl.col("cid2").is_in(list)))
    .collect()
)
shape: (1, 3)
┌──────┬──────┬──────┐
│ cid1 ┆ cid2 ┆ cid3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ i64  │
╞══════╪══════╪══════╡
│ 1    ┆ 4    ┆ 7    │
└──────┴──────┴──────┘

But if we attempt to use the in operator instead, we get our error again.

(
    df.lazy()
    .filter((pl.col("cid1") in list) & (pl.col("cid2") in list))
    .collect()
)
Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
  File "/home/corey/.virtualenvs/StackOverflow/lib/python3.10/site-packages/polars/internals/expr/expr.py", line 155, in __bool__
    raise ValueError(
ValueError: Since Expr are lazy, the truthiness of an Expr is ambiguous. Hint: use '&' or '|' to chain Expr together, not and/or.
  • Related