Creating a new variable column that increases/decreases based on an existing column-CodePudding

I am trying to build a model where the price benchmark of electricity increases when wind energy is down and vice versa when there's lots of wind.

my dataframe looks like this:

   df:

                          wind      Cost
2021-01-01 01:00:00   4281.000000  12.72250
2021-01-01 02:00:00   4384.083333  11.34000
2021-01-01 03:00:00   4405.666667  11.34000
2021-01-01 04:00:00   4514.666667   9.93300
2021-01-01 05:00:00   4692.416667   9.49200
...                           ...       ...
2021-12-31 20:00:00   9698.000000  32.87550
2021-12-31 21:00:00   9854.083333  34.38225
2021-12-31 22:00:00   9880.916667  29.61000
2021-12-31 23:00:00  10356.500000  11.76000
2022-01-01 00:00:00  10478.500000  15.75000`

I've used df.describe() to summarise the data and got this:

         wind           Cost
count 8760.000000   8760.000000
mean 5588.449878    22.131348
std 3774.164710 9.547735
min 56.333333   0.042000
25% 2297.604167 13.475437
50% 4792.375000 20.160000
75% 8710.187500 33.012000
max 14132.166667    34.996500

How do I please create a new column for every hour and depending on the wind in the system, the price is set. So for example when the value under the wind column falls in the min-25% percentile, the cost goes up by 5. 25-50%, cost by 2.5. However, when wind value falls between 75%-max, cost decreases by 5, and 50%-75% cost goes down by 2.5.

Any help or being pointed in the right direction would be helpful. Many thanks

CodePudding user response：

I think pd.cut would solve your problem.

For example,

import numpy as np
import pandas as pd


df["cost_difference"] = pd.cut(
    df["wind"],
    bins=np.quantile(df["wind"], [0,0.25,0.5,0.75,1]), 
    include_lowest=True, 
    labels=[5, 2.5, -2.5, -5]).astype(float)

EDIT: I forgot about pd.qcut, which basically packages np.quantile into cut, thereby saving you a step:

df["cost_difference"] = pd.qcut(
    df["wind"],
    q=[0, 0.25, 0.5, 0.75, 1],
    labels=[5, 2.5, -2.5, -5],
).astype(float)

This adds a column "cost_difference" that has the values provided as "labels" for each range of values, by quantile.

So for this case wind range maps to cost difference as

   0-25%  ->  5
 25%-50%  ->  2.5
 50%-75%  -> -2.5
 75%-100% -> -5

then you could take df["Cost"] df["cost_difference"] to get your final cost.

EDIT2: added cast to float so that addition works.