How can I calculate the elementwise maximum of two columns in Polars inside an expression?
Polars version = 0.13.31
Problem statement as code:
import polars as pl
import numpy as np
df = pl.DataFrame({
"a": np.arange(5),
"b": np.arange(5)[::-1]
})
# Produce a column with the values [4, 3, 2, 3, 4] using df.select([ ... ]).alias("max(a, b)")
Things I've tried
Polars claims to support numpy universal functions (docs), which includes np.maximum which does what I'm asking for. However when I try that I get an error.
df.select([
np.maximum(pl.col("a"), pl.col("b")).alias("max(a, b)")
])
# TypeError: maximum() takes from 2 to 3 positional arguments but 1 were given
There appears to be no Polars builtin for this, there is pl.max
but this returns only the single maximum element in an array.
Using .map()
my_df.select([
pl.col(["a", "b"]).map(np.maximum)
])
# PanicException
Current workaround
I'm able to do this using the following snippet however I want to be able to do this inside an expresion as it's much more convenient.
df["max(a, b)"] = np.maximum(df["a"], df["b"])
CodePudding user response:
You were close. polars.max
, when used with a list of Expressions, will return the element-wise max. From the documentation:
List[Expr] -> aggregate the maximum value horizontally.
Thus, for your example:
df.with_column(
pl.max([pl.col('a'), pl.col('b')]).alias('max(a, b)')
)
shape: (5, 3)
┌─────┬─────┬───────────┐
│ a ┆ b ┆ max(a, b) │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═══════════╡
│ 0 ┆ 4 ┆ 4 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 1 ┆ 3 ┆ 3 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 2 ┆ 2 ┆ 2 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 3 ┆ 1 ┆ 3 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 4 ┆ 0 ┆ 4 │
└─────┴─────┴───────────┘
For reference, polars.min
, polars.sum
, polars.any
, and polars.all
will also perform element-wise calculations when supplied with a list of expressions.