import polars as pl
df = pl.read_csv("https: // j.mp/ iriscsv".replace(" ","")
change = df.with_columns((pl.col("sepal_width").shift_and_fill(1, 0).pct_change(1).alias("pct_1"))).with_columns(
[
pl.when(pl.col("pct_1").is_infinite()).then(float(0)).otherwise(pl.col("pct_1")).fill_null(float(0)).keep_name(),
pl.col("pct_1").cumsum().alias("cumsum_pct_1")
]
)
Given a dataset, I want to calculate cumsum after using pct_change. But the results are inf even after fill. Have searched for a while. Someone helps me, please.
Now:
┌──────────────┬─────────────┬──────────────┬─────────────┬───────────┬───────────┬──────────────┐
│ sepal_length ┆ sepal_width ┆ petal_length ┆ petal_width ┆ species ┆ pct_1 ┆ cumsum_pct_1 │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 ┆ f64 ┆ str ┆ f64 ┆ f64 │
╞══════════════╪═════════════╪══════════════╪═════════════╪═══════════╪═══════════╪══════════════╡
│ 5.1 ┆ 3.5 ┆ 1.4 ┆ 0.2 ┆ setosa ┆ 0.0 ┆ null │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 4.9 ┆ 3.0 ┆ 1.4 ┆ 0.2 ┆ setosa ┆ 0.0 ┆ inf │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 4.7 ┆ 3.2 ┆ 1.3 ┆ 0.2 ┆ setosa ┆ -0.142857 ┆ inf │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 4.6 ┆ 3.1 ┆ 1.5 ┆ 0.2 ┆ setosa ┆ 0.066667 ┆ inf │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
Expected:
┌──────────────┬─────────────┬──────────────┬─────────────┬───────────┬───────────┬──────────────┐
│ sepal_length ┆ sepal_width ┆ petal_length ┆ petal_width ┆ species ┆ pct_1 ┆ cumsum_pct_1 │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 ┆ f64 ┆ str ┆ f64 ┆ f64 │
╞══════════════╪═════════════╪══════════════╪═════════════╪═══════════╪═══════════╪══════════════╡
│ 5.1 ┆ 3.5 ┆ 1.4 ┆ 0.2 ┆ setosa ┆ 0.0 ┆ 0.0│
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 4.9 ┆ 3.0 ┆ 1.4 ┆ 0.2 ┆ setosa ┆ 0.0 ┆ 0.0│
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 4.7 ┆ 3.2 ┆ 1.3 ┆ 0.2 ┆ setosa ┆ -0.142857 ┆ -0.142857│
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 4.6 ┆ 3.1 ┆ 1.5 ┆ 0.2 ┆ setosa ┆ 0.066667 ┆ -0.076190│
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
CodePudding user response:
I think the problem you have is a small misunderstanding how expression work inside a context ("with_columns"). All expression inside a context run in parallel so changes of one column aren't visible to the other expressions. So to solve your problem you have to do it step wise
change = (
df.with_columns(
(pl.col("sepal_width").shift_and_fill(1, 0).pct_change(1).alias("pct_1"))
)
.with_columns(
[
pl.when(pl.col("pct_1").is_infinite())
.then(float(0))
.otherwise(pl.col("pct_1"))
.fill_null(float(0))
.keep_name()
]
)
.with_columns([pl.col("pct_1").cumsum().alias("cumsum_pct_1")])
)
see also this question