Is there a way for me to dynamically multiply columns together based on a value in another column in Python? I'm using Polars if that makes a difference. For example, if calendar_year
is 2018, I'd want to multiply columns 2018
, 2019
, 2020
, and 2021
together, but if calendar_year
is 2019, I'd only want to multiply columns 2019
, 2020
, and 2021
together. I'd like to store the result in a new column called product
. In the future, we'll have additional columns such as 2022
, and 2023
, so I'd love the ability to have my formula account for these new columns without having to go into the code base each year and add them to my product manually.
id | ... | calendar_year | 2017 | 2018 | 2019 | 2020 | 2021 | product |
---|---|---|---|---|---|---|---|---|
123 | ... | 2018 | 0.998 | 0.997 | 0.996 | 0.995 | 0.994 | 0.9801 |
456 | ... | 2019 | 0.993 | 0.992 | 0.991 | 0.990 | 0.989 | 0.9557 |
Thanks in advance for the help!
CodePudding user response:
I think you could use the np.where(condition,then,else)
function to do that.
Do you want to create a new column with the result of that operation? This could work
df['2018_result'] = np.where(df.calendar_year.isin(['2019','2020','2021']),df.2019*df.2020*df.2021, 'add more calculations')
Would be great if you can provide more details :)
CodePudding user response:
It looks like you want to multiply CY factors for all years beyond calendar_year
, and not have to update this logic for each year.
If that's the case, one way to avoid hard-coding the CY selections is to use melt
and filter the results.
(
df
.select([
'id',
'calendar_year',
pl.col('^20\d\d$')
])
.melt(
id_vars=['id', 'calendar_year'],
variable_name='CY',
value_name='CY factor',
)
.with_column(pl.col('CY').cast(pl.Int64))
.filter(pl.col('CY') >= pl.col('calendar_year'))
.groupby('id')
.agg(
pl.col('CY factor').product().alias('product')
)
)
shape: (2, 2)
┌─────┬──────────┐
│ id ┆ product │
│ --- ┆ --- │
│ i64 ┆ f64 │
╞═════╪══════════╡
│ 456 ┆ 0.970298 │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 123 ┆ 0.982119 │
└─────┴──────────┘
From there, you can join the result back to your original dataset (using the id
column).