I have a model I am building and in the test version I am stuck on how to pass multiple conditions to generate a new dataframe from existing ones. I currently have an inefficient function that loops through my dataframes one for each period (1-5) and one for each date in the dataset.
I have created a subset of the data for illustration of my problem so that it is a little less clunky to understand my problem based on feedback from previous users. In my actual dataset I have multiple IDs, periods 1-48 and many more dates.
I am trying to pass through conditions between three dataframes to generate a new one that will later be fed into a new part of my model. I have the completed model in excel but now translating it in pandas. In excel the solution is using nested IF statements with sums.
What I am trying to do is apply to following conditions to my below dataframes:
For every settlement period (1-10) on each row I need to assess the following conditions:
if the cumulative sum (from dataframe 2) is < value x in dataframe 3 (fossil)
AND
if the cumulative sum (from dataframe 2) the relating value from dataframe one is > value x in dataframe 3 (fossil)
THEN: value x in dataframe 3 - the cumulative sum value (from dataframe 2)
otherwise: value from dataframe 1
Dataframes:
DF1
Index | BM Unit ID | Technology | Rank | Gas_Quantile | Coal_Quantile | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|---|---|---|---|---|
0 | ID 1 | Gas | 1 | 0 | 130.4332 | 130.932 | 130.78 | 130.58 | 130.568 | |
1 | ID 2 | Gas | 2 | 0 | 339.45 | 342.325 | 322.525 | 312.4 | 303.775 | |
2 | ID 3 | Gas | 3 | 1 | 363.312 | 386.712 | 364.464 | 363.312 | 363.312 | |
3 | ID 4 | Coal | 4 | 0 | 334.4 | 419.5 | 436.7 | 441.9 | 440.5 | |
4 | ID 5 | Gas | 5 | 1 | 370.65 | 370.45 | 359.9 | 326.25 | 326.2 | |
5 | ID 6 | Coal | 6 | 0 | 337 | 423.4 | 423.1 | 427.5 | 427 | |
6 | ID 7 | Gas | 7 | 2 | 240.4065 | 293.169 | 252.2675 | 256.5055 | 261.653 | |
7 | ID 8 | Gas | 8 | 2 | 297.7333 | 360.2667 | 355.4 | 357.0667 | 358.6667 | |
8 | ID 9 | Gas | 9 | 3 | 106.624 | 106.112 | 105.964 | 106 | 106 | |
9 | ID 10 | Gas | 10 | 3 | 432.8 | 430.4 | 430.7 | 431.9 | 432.1 |
DF2
Index | BM Unit ID | Technology | Rank | Gas_Quantile | Coal_Quantile | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|---|---|---|---|---|
0 | ID 1 | Gas | 1 | 0 | 130.4332 | 130.932 | 130.78 | 130.58 | 130.568 | |
1 | ID 2 | Gas | 2 | 0 | 469.8832 | 473.257 | 453.305 | 442.98 | 434.343 | |
2 | ID 3 | Gas | 3 | 1 | 833.1952 | 859.969 | 817.769 | 806.292 | 797.655 | |
3 | ID 4 | Coal | 4 | 0 | 1167.595 | 1279.469 | 1254.469 | 1248.192 | 1238.155 | |
4 | ID 5 | Gas | 5 | 1 | 1538.245 | 1649.919 | 1614.369 | 1574.442 | 1564.355 | |
5 | ID 6 | Coal | 6 | 0 | 1875.245 | 2073.319 | 2037.469 | 2001.942 | 1991.355 | |
6 | ID 7 | Gas | 7 | 2 | 2115.652 | 2366.488 | 2289.737 | 2258.448 | 2253.008 | |
7 | ID 8 | Gas | 8 | 2 | 2413.385 | 2726.755 | 2645.137 | 2615.514 | 2611.675 | |
8 | ID 9 | Gas | 9 | 3 | 2520.009 | 2832.867 | 2751.101 | 2721.514 | 2717.675 | |
9 | ID 10 | Gas | 10 | 3 | 2952.809 | 3263.267 | 3181.801 | 3153.414 | 3149.775 |
DF3
settlementPeriod | 1 | 2 | 3 | 4 | 5 | |
---|---|---|---|---|---|---|
settlementDate | Type | |||||
03/01/2022 | Fossil | 2540.10 | 2884.05 | 2322.03 | 2027.54 | 2043.56 |
ZE | 18190.57 | 18261.24 | 18367.98 | 18198.04 | 18072.02 | |
04/01/2022 | Fossil | 2772.00 | 3415.52 | 3534.11 | 3580.13 | 3501.39 |
ZE | 16883.01 | 16655.47 | 16581.57 | 16322.97 | 16027.87 | |
05/01/2022 | Fossil | 2653.98 | 2700.04 | 2186.64 | 1702.38 | 1617.53 |
ZE | 19296.58 | 19774.30 | 20163.37 | 20379.58 | 20584.48 | |
06/01/2022 | Fossil | 11556.75 | 11924.36 | 11581.64 | 11144.16 | 11358.06 |
ZE | 11850.34 | 11698.00 | 11801.11 | 11592.45 | 11671.91 | |
07/01/2022 | Fossil | 2373.65 | 2418.45 | 2221.58 | 2154.57 | 2192.19 |
ZE | 18433.11 | 17909.67 | 17774.73 | 17816.40 | 17868.83 | |
08/01/2022 | Fossil | 6407.98 | 6529.93 | 6075.51 | 5258.58 | 4559.91 |
ZE | 15237.24 | 15360.68 | 14994.30 | 14741.95 | 14695.10 |
Example desired output with one input for period one:
Index | BM Unit ID | Technology | Rank | Gas_Quantile | Coal_Quantile | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|---|---|---|---|---|
0 | ID 1 | Gas | 1 | 0 | 130.43 | |||||
1 | ID 2 | Gas | 2 | 0 | ||||||
2 | ID 3 | Gas | 3 | 1 | ||||||
3 | ID 4 | Coal | 4 | 0 | ||||||
4 | ID 5 | Gas | 5 | 1 | ||||||
5 | ID 6 | Coal | 6 | 0 | ||||||
6 | ID 7 | Gas | 7 | 2 | ||||||
7 | ID 8 | Gas | 8 | 2 | ||||||
8 | ID 9 | Gas | 9 | 3 | ||||||
9 | ID 10 | Gas | 10 | 3 |
My current solution which does not seem to be that efficient in my next step is as follows with another loop for each day in another function:
def run_loop_for_day_SP_generation(DF1,DF2,DF3):
modelled_gen_df = pd.DataFrame(index=DF1.index,columns=DF1.columns)
for SP in DF3.columns:
for row in DF1.index:
if row == 0:
modelled_gen_df.loc[row,SP]=DF1.loc[row,SP]
elif DF2.loc[row-1,SP]<DF3[SP][0]:
if DF2.loc[row-1,SP] DF1.loc[row,SP]>DF3[SP][0]:
modelled_gen_df.loc[row,SP]=DF3[SP][0] - DF2.loc[row-1,SP]
else:
modelled_gen_df.loc[row,SP]=DF1.loc[row,SP]
else:
modelled_gen_df.loc[row,SP] = 0
modelled_gen_df[['BM Unit ID','Technology','ave rank','rank']] =DF1[['BM Unit ID','Technology','ave rank']]
return modelled_gen_df
What would the most pythonic way to solve this be? using np.select?
ok more questions. 1) your conditions. there are two if statements back to back, is it "if this cond AND this cond, then x, else y?" 2) why there is only 1 value in df3 and the rest x and y? Is it just for demonstration or is it really like that? 3) you write "settlementPeriod (1-10)", with that you mean for each settlementDate Type "Fossil" right? 4) we don't know Merit_order which occurs in your function. Can't run the code like that.
Update:
There is only one value in DF3 for demonstration. Here I used X and Y to fill it in.
SettlementPeriod is the columns 1-5 as in DF1,2 and 3. Settlement date is the date of the data which is why I need a second function that then loops through all days.
Update: Second function to go through all dates
def model_all_days_generation(DF1,DF2,DF3):
all_dates = DF3.index.get_level_values(0).unique()
modelled_generation_dictionary = {'Date':'dataframe'} ## Top label
for date in all_dates:
single_day_fossil = DF3.loc[:,'Fossil',:].loc[date].to_frame().transpose()
day_modelled_gen = run_loop_for_day_SP_generation(DF1,DF2,DF3)
modelled_generation_dictionary[date] = day_modelled_gen
return modelled_generation_dictionary
EDIT:
Output from debug:
(('03/01/2022', 'Fossil'), settlementPeriod
1 2540.100
2 2884.050
3 2322.026
4 2027.544
5 2043.558
6 1967.350
7 2050.054
8 1917.484
9 1948.606
10 1912.418
11 1998.150
12 2441.200
13 3098.836
14 3052.854
15 3470.942
16 3844.768
17 4515.572
18 5700.036
19 7408.294
20 7944.532
21 7185.508
22 7200.348
23 7052.050
24 7807.184
25 8065.094
26 8011.100
27 8355.068
28 8567.930
29 8657.718
30 8810.142
31 9275.370
32 9910.762
33 10308.158
34 11240.784
35 11660.706
36 11624.170
37 11452.386
38 11219.704
39 10306.176
40 9785.316
41 8583.608
42 7625.128
43 6738.098
44 5965.298
45 5475.074
46 4584.388
47 3761.072
48 2774.104
Name: (03/01/2022, Fossil), dtype: float64)
Error:
KeyError: "None of [Index(['1,', '2,', '3,', '4,', '5,', '6,', '7,', '8,', '9,', '10,', '11,',\n '12,', '13,', '14,', '15,', '16,', '17,', '18,', '19,', '20,', '21,',\n '22,', '23,', '24,', '25,', '26,', '27,', '28,', '29,', '30,', '31,',\n '32,', '33,', '34,', '35,', '36,', '37,', '38,', '39,', '40,', '41,',\n '42,', '43,', '44,', '45,', '46,', '47,', '48'],\n dtype='object')] are in the [columns]"
CodePudding user response:
Here is my approach to your task:
The function choices
will calculate the new values of the columns 1-5 with its conditions.
The function each_date
will calculate that new dataframe for each date where Type == 'Fossil'
cols = list('12345')
# ['1', '2', '3', '4', '5'] # predefine all columns you need here
def choices(c, thresh):
col = c.name
thresh = thresh[col]
cond1 = df2[col].shift() < thresh
cond2 = (df2[col].shift() df1[col]) > thresh
m1 = cond1 & cond2
m2 = cond1 & (~cond2)
m3 = ~cond1
#no need to definde m1, m2, m3 here, but easier to read imo
cond = [m1, m2, m3]
choices = [thresh - df2[col].shift(), df1[col], 0]
return np.select(cond, choices) # Series with the length of df1
def each_date(row):
tmp = df1[cols].apply(choices, thresh=row) # choices will be applied per column
tmp.loc[0, :] = df1.loc[0,cols] # values of the first row of the new df get values of df1
return tmp
#create your dictionary
modelled_generation_dictionary = {'Date':'dataframe'}
#loop through each row, apply `each_date` on each row (which is a Series) and concat some columns to it, I assumed you want to have Unit ID and Technology with it
for row in df3.loc[df3.index.get_level_values(1)=='Fossil'].iterrows():
print(row) # for debugging
res = pd.concat([df1[["BM Unit ID", "Technology"]], each_date(row[1])],axis=1)
# change key from timestamp to Date only and make it a string for format like: "2022-03-01"
modelled_generation_dictionary[f"{row[0][0].date()}"] = res
print(modelled_generation_dictionary)
For debugging: I added the row print(row)
, each row should be a tuple, containing a tuple as 1st element (with two elements, 1st the timestamp, 2nd the Type) and the 2nd element a pandas.Series
, like this:
((Timestamp('2022-03-01 00:00:00'), 'Fossil'), 1 2540.10
2 2884.05
3 2322.03
4 2027.54
5 2043.56
Name: (2022-03-01 00:00:00, Fossil), dtype: float64)
Output of that dict:
Date
dataframe
2022-03-01
Index BM Unit ID Technology 1 2 3 4 5
0 0 ID 1 Gas 130.4332 130.9320 130.7800 130.580 130.568
1 1 ID 2 Gas 339.4500 342.3250 322.5250 312.400 303.775
2 2 ID 3 Gas 363.3120 386.7120 364.4640 363.312 363.312
3 3 ID 4 Coal 334.4000 419.5000 436.7000 441.900 440.500
4 4 ID 5 Gas 370.6500 370.4500 359.9000 326.250 326.200
5 5 ID 6 Coal 337.0000 423.4000 423.1000 427.500 427.000
6 6 ID 7 Gas 240.4065 293.1690 252.2675 25.598 52.205
7 7 ID 8 Gas 297.7333 360.2667 32.2930 0.000 0.000
8 8 ID 9 Gas 106.6240 106.1120 0.0000 0.000 0.000
9 9 ID 10 Gas 20.0910 51.1830 0.0000 0.000 0.000
2022-04-01
Index BM Unit ID Technology 1 2 3 4 5
0 0 ID 1 Gas 130.4332 130.9320 130.7800 130.5800 130.5680
1 1 ID 2 Gas 339.4500 342.3250 322.5250 312.4000 303.7750
2 2 ID 3 Gas 363.3120 386.7120 364.4640 363.3120 363.3120
3 3 ID 4 Coal 334.4000 419.5000 436.7000 441.9000 440.5000
4 4 ID 5 Gas 370.6500 370.4500 359.9000 326.2500 326.2000
5 5 ID 6 Coal 337.0000 423.4000 423.1000 427.5000 427.0000
6 6 ID 7 Gas 240.4065 293.1690 252.2675 256.5055 261.6530
7 7 ID 8 Gas 297.7333 360.2667 355.4000 357.0667 358.6667
8 8 ID 9 Gas 106.6240 106.1120 105.9640 106.0000 106.0000
9 9 ID 10 Gas 251.9910 430.4000 430.7000 431.9000 432.1000
2022-05-01
Index BM Unit ID Technology 1 2 3 4 5
0 0 ID 1 Gas 130.4332 130.932 130.780 130.580 130.568
1 1 ID 2 Gas 339.4500 342.325 322.525 312.400 303.775
2 2 ID 3 Gas 363.3120 386.712 364.464 363.312 363.312
3 3 ID 4 Coal 334.4000 419.500 436.700 441.900 440.500
4 4 ID 5 Gas 370.6500 370.450 359.900 326.250 326.200
5 5 ID 6 Coal 337.0000 423.400 423.100 127.938 53.175
6 6 ID 7 Gas 240.4065 293.169 149.171 0.000 0.000
7 7 ID 8 Gas 297.7333 333.552 0.000 0.000 0.000
8 8 ID 9 Gas 106.6240 0.000 0.000 0.000 0.000
9 9 ID 10 Gas 133.9710 0.000 0.000 0.000 0.000
2022-06-01
Index BM Unit ID Technology 1 2 3 4 5
0 0 ID 1 Gas 130.4332 130.9320 130.7800 130.5800 130.5680
1 1 ID 2 Gas 339.4500 342.3250 322.5250 312.4000 303.7750
2 2 ID 3 Gas 363.3120 386.7120 364.4640 363.3120 363.3120
3 3 ID 4 Coal 334.4000 419.5000 436.7000 441.9000 440.5000
4 4 ID 5 Gas 370.6500 370.4500 359.9000 326.2500 326.2000
5 5 ID 6 Coal 337.0000 423.4000 423.1000 427.5000 427.0000
6 6 ID 7 Gas 240.4065 293.1690 252.2675 256.5055 261.6530
7 7 ID 8 Gas 297.7333 360.2667 355.4000 357.0667 358.6667
8 8 ID 9 Gas 106.6240 106.1120 105.9640 106.0000 106.0000
9 9 ID 10 Gas 432.8000 430.4000 430.7000 431.9000 432.1000
2022-07-01
Index BM Unit ID Technology 1 2 3 4 5
0 0 ID 1 Gas 130.4332 130.932 130.780 130.580 130.568
1 1 ID 2 Gas 339.4500 342.325 322.525 312.400 303.775
2 2 ID 3 Gas 363.3120 386.712 364.464 363.312 363.312
3 3 ID 4 Coal 334.4000 419.500 436.700 441.900 440.500
4 4 ID 5 Gas 370.6500 370.450 359.900 326.250 326.200
5 5 ID 6 Coal 337.0000 423.400 423.100 427.500 427.000
6 6 ID 7 Gas 240.4065 293.169 184.111 152.628 200.835
7 7 ID 8 Gas 257.9980 51.962 0.000 0.000 0.000
8 8 ID 9 Gas 0.0000 0.000 0.000 0.000 0.000
9 9 ID 10 Gas 0.0000 0.000 0.000 0.000 0.000
2022-08-01
Index BM Unit ID Technology 1 2 3 4 5
0 0 ID 1 Gas 130.4332 130.9320 130.7800 130.5800 130.5680
1 1 ID 2 Gas 339.4500 342.3250 322.5250 312.4000 303.7750
2 2 ID 3 Gas 363.3120 386.7120 364.4640 363.3120 363.3120
3 3 ID 4 Coal 334.4000 419.5000 436.7000 441.9000 440.5000
4 4 ID 5 Gas 370.6500 370.4500 359.9000 326.2500 326.2000
5 5 ID 6 Coal 337.0000 423.4000 423.1000 427.5000 427.0000
6 6 ID 7 Gas 240.4065 293.1690 252.2675 256.5055 261.6530
7 7 ID 8 Gas 297.7333 360.2667 355.4000 357.0667 358.6667
8 8 ID 9 Gas 106.6240 106.1120 105.9640 106.0000 106.0000
9 9 ID 10 Gas 432.8000 430.4000 430.7000 431.9000 432.1000