Home > other >  Is there a way to perform lazy operations in Polars on multiple frames at the same time?
Is there a way to perform lazy operations in Polars on multiple frames at the same time?

Time:12-18

Lazy more and query optimization in Polars is a great tool for saving on memory allocations and CPU usage for a single data frame. I wonder if there is a way to do this for multiple lazy frames as:

lpdf1 = pdf1.lazy()
lpdf2 = pdf2.lazy()

result_lpdf = -lpdf1/lpdf2
result_pdf = result_lpdf.collect()

The above code will not run, as division and negation is not implemented for LazyFrame. Yet my aim would be to create the new result_pdf frame without creating temporary frames for division, then yet another for negation (as it would be the case in pandas and numpy).

I'm trying to get some performance improvement relative to -pdf1/pdf2, on frames of size (283681, 93). Any suggestions are welcome.

CodePudding user response:

You can use .with_context()

Adding a suffix to one set of columns allows you to distinguish between them.

left = pl.DataFrame(dict(a=[-16, -12, -9], b=[20, 12, 10])).lazy()
right = pl.DataFrame(dict(a=[4, 3, 3], b=[10, 2, 5])).lazy()
(
   left
   .with_context(right.select(pl.all().suffix("_right")))
   .select(
      pl.col(name) * -1 / pl.col(f"{name}_right")
      for name in left.columns
   )
   .collect()
)
shape: (3, 2)
┌─────┬──────┐
│ a   | b    │
│ --- | ---  │
│ f64 | f64  │
╞═════╪══════╡
│ 4.0 | -2.0 │
├─────┼──────┤
│ 4.0 | -6.0 │
├─────┼──────┤
│ 3.0 | -2.0 │
└─//──┴─//───┘

CodePudding user response:

In Polars, you can use apply()

As example

def neg_div(x, y):
    return -x / y

result_lpdf = lpdf1.apply(neg_div, other=lpdf2)
result_pdf = result_lpdf.collect()

apply gonna be to each element in lpdf1 and lpdf2, and after that going for resulting lazy frame with the transformed elements

  • Related