Home > Blockchain >  Calculate cumulative sum after percent change polar?
Calculate cumulative sum after percent change polar?

Time:12-30

import polars as pl
df = pl.read_csv("https: // j.mp/ iriscsv".replace(" ","")
change = df.with_columns((pl.col("sepal_width").shift_and_fill(1, 0).pct_change(1).alias("pct_1"))).with_columns(
[
pl.when(pl.col("pct_1").is_infinite()).then(float(0)).otherwise(pl.col("pct_1")).fill_null(float(0)).keep_name(),
    pl.col("pct_1").cumsum().alias("cumsum_pct_1")
]
)

Given a dataset, I want to calculate cumsum after using pct_change. But the results are inf even after fill. Have searched for a while. Someone helps me, please.

Now:

┌──────────────┬─────────────┬──────────────┬─────────────┬───────────┬───────────┬──────────────┐
│ sepal_length ┆ sepal_width ┆ petal_length ┆ petal_width ┆ species   ┆ pct_1     ┆ cumsum_pct_1 │
│ ---          ┆ ---         ┆ ---          ┆ ---         ┆ ---       ┆ ---       ┆ ---          │
│ f64          ┆ f64         ┆ f64          ┆ f64         ┆ str       ┆ f64       ┆ f64          │
╞══════════════╪═════════════╪══════════════╪═════════════╪═══════════╪═══════════╪══════════════╡
│ 5.1          ┆ 3.5         ┆ 1.4          ┆ 0.2         ┆ setosa    ┆ 0.0       ┆ null         │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 4.9          ┆ 3.0         ┆ 1.4          ┆ 0.2         ┆ setosa    ┆ 0.0       ┆ inf          │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 4.7          ┆ 3.2         ┆ 1.3          ┆ 0.2         ┆ setosa    ┆ -0.142857 ┆ inf          │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 4.6          ┆ 3.1         ┆ 1.5          ┆ 0.2         ┆ setosa    ┆ 0.066667  ┆ inf          │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤

Expected:

┌──────────────┬─────────────┬──────────────┬─────────────┬───────────┬───────────┬──────────────┐
│ sepal_length ┆ sepal_width ┆ petal_length ┆ petal_width ┆ species   ┆ pct_1     ┆ cumsum_pct_1 │
│ ---          ┆ ---         ┆ ---          ┆ ---         ┆ ---       ┆ ---       ┆ ---          │
│ f64          ┆ f64         ┆ f64          ┆ f64         ┆ str       ┆ f64       ┆ f64          │
╞══════════════╪═════════════╪══════════════╪═════════════╪═══════════╪═══════════╪══════════════╡
│ 5.1          ┆ 3.5         ┆ 1.4          ┆ 0.2         ┆ setosa    ┆ 0.0       ┆ 0.0│
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 4.9          ┆ 3.0         ┆ 1.4          ┆ 0.2         ┆ setosa    ┆ 0.0       ┆ 0.0│
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 4.7          ┆ 3.2         ┆ 1.3          ┆ 0.2         ┆ setosa    ┆ -0.142857 ┆ -0.142857│
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 4.6          ┆ 3.1         ┆ 1.5          ┆ 0.2         ┆ setosa    ┆ 0.066667  ┆ -0.076190│
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤

CodePudding user response:

I think the problem you have is a small misunderstanding how expression work inside a context ("with_columns"). All expression inside a context run in parallel so changes of one column aren't visible to the other expressions. So to solve your problem you have to do it step wise

change = (
    df.with_columns(
        (pl.col("sepal_width").shift_and_fill(1, 0).pct_change(1).alias("pct_1"))
    )
    .with_columns(
        [
            pl.when(pl.col("pct_1").is_infinite())
            .then(float(0))
            .otherwise(pl.col("pct_1"))
            .fill_null(float(0))
            .keep_name()
        ]
    )
    .with_columns([pl.col("pct_1").cumsum().alias("cumsum_pct_1")])
)

see also this question

  • Related