I'm really new to Polars (v0.15.8)...so I really don't know what I'm doing.
I have a Dataframe and I would like to check whether each row from a column exists within a separately defined list.
For example, here is my list:
list_animal = ['cat', 'mouse', 'dog', 'sloth', 'zebra']
and here is my Dataframe:
df = pl.DataFrame([
pl.Series('thing', ['cat', 'plant', 'mouse', 'dog', 'sloth', 'zebra', 'shoe']),
pl.Series('isAnimal', [None, None, None, None, None, None, None]),
])
...which looks like this:
I would like the df to end up like:
I'm struggling my way through some examples and the Polars documentation. I have found two options:
- use the pl.when function:
df = (df.with_column(
pl.when(
(pl.col("thing") in list_animal)
)
.then(True)
.otherwise(False)
.alias("isAnimal2")
))
However, I get an error:
ValueError: Since Expr are lazy, the truthiness of an Expr is ambiguous. Hint: use '&' or '|' to chain Expr together, not and/or.
or,
- Using the docs here, I tried to follow the examples to apply an expression on the elements of a list. I couldn't make it work, but I tried this:
chk_if_true = pl.element() in list_animal
df.with_column(
pl.col("thing").arr.eval(chk_if_true, parallel=True).alias("isAnimal2")
)
...which gave me this error:
SchemaError: Series of dtype: Utf8 != List
I would appreciate any advice; thanks!
CodePudding user response:
You're looking for .is_in()
>>> df.with_column(pl.col("thing").is_in(list_animal).alias("isAnimal2"))
shape: (7, 3)
┌───────┬──────────┬───────────┐
│ thing | isAnimal | isAnimal2 │
│ --- | --- | --- │
│ str | f64 | bool │
╞═══════╪══════════╪═══════════╡
│ cat | null | true │
├───────┼──────────┼───────────┤
│ plant | null | false │
├───────┼──────────┼───────────┤
│ mouse | null | true │
├───────┼──────────┼───────────┤
│ dog | null | true │
├───────┼──────────┼───────────┤
│ sloth | null | true │
├───────┼──────────┼───────────┤
│ zebra | null | true │
├───────┼──────────┼───────────┤
│ shoe | null | false │
└───────┴──────────┴───────────┘