I am trying to build a model where the price benchmark of electricity increases when wind energy is down and vice versa when there's lots of wind.
my dataframe looks like this:
df:
wind Cost
2021-01-01 01:00:00 4281.000000 12.72250
2021-01-01 02:00:00 4384.083333 11.34000
2021-01-01 03:00:00 4405.666667 11.34000
2021-01-01 04:00:00 4514.666667 9.93300
2021-01-01 05:00:00 4692.416667 9.49200
... ... ...
2021-12-31 20:00:00 9698.000000 32.87550
2021-12-31 21:00:00 9854.083333 34.38225
2021-12-31 22:00:00 9880.916667 29.61000
2021-12-31 23:00:00 10356.500000 11.76000
2022-01-01 00:00:00 10478.500000 15.75000`
I've used df.describe() to summarise the data and got this:
wind Cost
count 8760.000000 8760.000000
mean 5588.449878 22.131348
std 3774.164710 9.547735
min 56.333333 0.042000
25% 2297.604167 13.475437
50% 4792.375000 20.160000
75% 8710.187500 33.012000
max 14132.166667 34.996500
How do I please create a new column for every hour and depending on the wind in the system, the price is set. So for example when the value under the wind column falls in the min-25% percentile, the cost goes up by 5. 25-50%, cost by 2.5. However, when wind value falls between 75%-max, cost decreases by 5, and 50%-75% cost goes down by 2.5.
Any help or being pointed in the right direction would be helpful. Many thanks
CodePudding user response:
I think pd.cut
would solve your problem.
For example,
import numpy as np
import pandas as pd
df["cost_difference"] = pd.cut(
df["wind"],
bins=np.quantile(df["wind"], [0,0.25,0.5,0.75,1]),
include_lowest=True,
labels=[5, 2.5, -2.5, -5]).astype(float)
EDIT: I forgot about pd.qcut, which basically packages np.quantile
into cut
, thereby saving you a step:
df["cost_difference"] = pd.qcut(
df["wind"],
q=[0, 0.25, 0.5, 0.75, 1],
labels=[5, 2.5, -2.5, -5],
).astype(float)
This adds a column "cost_difference"
that has the values provided as "labels" for each range of values, by quantile.
So for this case wind range maps to cost difference as
0-25% -> 5
25%-50% -> 2.5
50%-75% -> -2.5
75%-100% -> -5
then you could take df["Cost"] df["cost_difference"]
to get your final cost.
EDIT2: added cast to float so that addition works.