I am trying to pick up the package polars from Python. I come from an R background so appreciate this might be an incredibly easy question.
I want to implement a case statement where if any of the conditions below are true, it will flag it to 1 otherwise it will be 0. My new column will be called 'my_new_column_flag'
I am however getting the error message
Traceback (most recent call last): File "", line 2, in File "C:\Users\foo\Miniconda3\envs\env\lib\site-packages\polars\internals\lazy_functions.py", line 204, in col return pli.wrap_expr(pycol(name)) TypeError: argument 'name': 'int' object cannot be converted to 'PyString'
import polars as pl
import numpy as np
np.random.seed(12)
df = pl.DataFrame(
{
"nrs": [1, 2, 3, None, 5],
"names": ["foo", "ham", "spam", "egg", None],
"random": np.random.rand(5),
"groups": ["A", "A", "B", "C", "B"],
}
)
print(df)
df.with_column(
pl.when(pl.col('nrs') == 1).then(pl.col(1))
.when(pl.col('names') == 'ham').then(pl.col(1))
.when(pl.col('random') == 0.014575).then(pl.col(1))
.otherwise(pl.col(0))
.alias('my_new_column_flag')
)
Can anyone help?
CodePudding user response:
pl.col
selects a column with the given name (as string). What you want is a column with literal value set to one: pl.lit(1)
df.with_columns(
pl.when(pl.col('nrs') == 1).then(pl.lit(1))
.when(pl.col('names') == 'ham').then(pl.lit(1))
.when(pl.col('random') == 0.014575).then(pl.lit(1))
.otherwise(pl.lit(0))
.alias('my_new_column_flag')
)
PS: it may look more natural to use predicate for your flat (and cast it to int if you want it to be 0/1 instead of true/false):
df.with_columns(
((pl.col("nrs") == 1) | (pl.col("names") == "ham") | (pl.col("random") == 0.014575))
.alias("my_new_column_flag")
.cast(int)
)