Multiply Columns Together Based on Condition-CodePudding

Is there a way for me to dynamically multiply columns together based on a value in another column in Python? I'm using Polars if that makes a difference. For example, if calendar_year is 2018, I'd want to multiply columns 2018, 2019, 2020, and 2021 together, but if calendar_year is 2019, I'd only want to multiply columns 2019, 2020, and 2021 together. I'd like to store the result in a new column called product. In the future, we'll have additional columns such as 2022, and 2023, so I'd love the ability to have my formula account for these new columns without having to go into the code base each year and add them to my product manually.

id	...	calendar_year	2017	2018	2019	2020	2021	product
123	...	2018	0.998	0.997	0.996	0.995	0.994	0.9801
456	...	2019	0.993	0.992	0.991	0.990	0.989	0.9557

Thanks in advance for the help!

CodePudding user response：

I think you could use the np.where(condition,then,else) function to do that. Do you want to create a new column with the result of that operation? This could work

df['2018_result'] = np.where(df.calendar_year.isin(['2019','2020','2021']),df.2019*df.2020*df.2021, 'add more calculations')

Would be great if you can provide more details :)

CodePudding user response：

It looks like you want to multiply CY factors for all years beyond calendar_year, and not have to update this logic for each year.

If that's the case, one way to avoid hard-coding the CY selections is to use melt and filter the results.

(
    df
    .select([
        'id',
        'calendar_year',
        pl.col('^20\d\d$')
    ])
    .melt(
        id_vars=['id', 'calendar_year'],
        variable_name='CY',
        value_name='CY factor',
    )
    .with_column(pl.col('CY').cast(pl.Int64))
    .filter(pl.col('CY') >= pl.col('calendar_year'))
    .groupby('id')
    .agg(
        pl.col('CY factor').product().alias('product')
    )
)

shape: (2, 2)
┌─────┬──────────┐
│ id  ┆ product  │
│ --- ┆ ---      │
│ i64 ┆ f64      │
╞═════╪══════════╡
│ 456 ┆ 0.970298 │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 123 ┆ 0.982119 │
└─────┴──────────┘

From there, you can join the result back to your original dataset (using the id column).