I am playing around with the Generator Output-Capability Month Report
data from the Independent Electricity System Operator IESO
. It is a publicly available data that shows the capability and output of each major electricity generator in Ontario, Canada. More specifically, I want to see the difference between the capability and the outputs, and how it changes over time.
Unfortunately, I do not like the way that the data is formatted, so I wanted to rearrange/restructure the data frame. But I don't know how to do this, so I was wondering if anyone here could help me with this. I reckon that this isn't going to be easy, but any help would be much appreciated!
If you wish to play around with the actual dataset instead of the simplified dataset that I'll be providing below, feel free to go to IESO Generator Output Capability Month Report and download PUB_GenOutputCapabilityMonth_202001.csv
dataset. I obviously prefer that you try this with the actual dataset, but it's your call.
That being said, Table 1 below is the simplified version of the dataset that only shows the first two generators (i.e., Abkenora Hydro unit & Adelaide Wind unit) in the PUB_GenOutputCapabilityMonth_202001.csv file. Note that, under the Measurement
column, Available Capacity
for the wind unit is essentially the same as the Capability
for other types of generators.
Table 1: Simplified Data
Delivery Date | Generator | Fuel Type | Measurement | Hour 1 | Hour 2 | Hour 3 | Hour 4 | Hour 5 | Hour 6 | Hour 7 | Hour 8 | Hour 9 | Hour 10 | Hour 11 | Hour 12 | Hour 13 | Hour 14 | Hour 15 | Hour 16 | Hour 17 | Hour 18 | Hour 19 | Hour 20 | Hour 21 | Hour 22 | Hour 23 | Hour 24 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2020-01-01 | ABKENORA | HYDRO | Capability | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 |
2020-01-01 | ABKENORA | HYDRO | Output | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 |
2020-01-01 | ADELAIDE | WIND | Available Capacity | 60 | 60 | 60 | 60 | 60 | 60 | 60 | 60 | 60 | 60 | 60 | 60 | 60 | 60 | 60 | 60 | 60 | 60 | 60 | 60 | 60 | 60 | 60 | 60 |
2020-01-01 | ADELAIDE | WIND | Forecast | 28 | 35 | 32 | 24 | 24 | 20 | 32 | 32 | 26 | 30 | 25 | 21 | 26 | 26 | 37 | 40 | 37 | 39 | 47 | 55 | 49 | 56 | 57 | 57 |
2020-01-01 | ADELAIDE | WIND | Output | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 18 | 25 | 29 | 38 | 43 | 34 | 43 | 49 | 57 | 51 | 59 | 58 | 57 |
Table 2: Slightly More Simplified Version of Table 1 (fewer hours)
JanuaryData = data.frame(`Delivery Date`= c('2020-01-01', '2020-01-01', '2020-01-01', '2020-01-01', '2020-01-01'),
`Generator` = c('ABKENORA', 'ABKENORA', 'ADELAIDE', 'ADELAIDE', 'ADELAIDE'),
`Fuel Type` = c('Hydro', 'Hydro', 'Wind', 'Wind', 'Wind'),
`Measurement` = c('Capability', 'Output', 'Available Capacity', 'Forecast', 'Output'),
`Hour 1` = c('13', '13', '60', '28', '1'),
`Hour 7` = c('13', '13', '60', '32', '0'),
`Hour 13` = c('13', '13', '60', '26', '25'),
`Hour 18` = c('13', '13', '60', '39', '43'),
`Hour 24` = c('13', '13', '60', '57', '57')
)
View(JanuaryData)
Now I want to rearrange/restructure the data and make it look something like Table 3 below. Note that Table 3 is based on Table 2 (i.e., only shows two generators and fewer number of hours). Again, the actual dataset contains dozens of generators and covers all 24 hours in a day by hourly interval (i.e., hour 1, 2, 3, ... , 23, 24). Also, I want to rename wind generator's "Available Capacity" and merge it to Capability
so I can represent all of them under one Capability
column instead of needlessly having two separate columns (i.e., "Available Capacity" column for wind generators & "Capability" column for other types of generators) describing essentially the same thing.
Table 3: Simplified Final Product - Based on Table 2 data
Delivery Date | Generator | Fuel Type | Hour | Capability | Output | Forecast |
---|---|---|---|---|---|---|
2020-01-01 | ABKENORA | Hydro | 1 | 13 | 13 | n/a |
2020-01-01 | ABKENORA | Hydro | 7 | 13 | 13 | n/a |
2020-01-01 | ABKENORA | Hydro | 13 | 13 | 13 | n/a |
2020-01-01 | ABKENORA | Hydro | 18 | 13 | 13 | n/a |
2020-01-01 | ABKENORA | Hydro | 24 | 13 | 13 | n/a |
2020-01-01 | ADELAIDE | WIND | 1 | 60 | 1 | 28 |
2020-01-01 | ADELAIDE | WIND | 7 | 60 | 0 | 32 |
2020-01-01 | ADELAIDE | WIND | 13 | 60 | 25 | 26 |
2020-01-01 | ADELAIDE | WIND | 18 | 60 | 43 | 39 |
2020-01-01 | ADELAIDE | WIND | 24 | 60 | 57 | 57 |
It would be great if your explanation can be as detailed as possible. Also, partial solutions will also be much appreciated!
CodePudding user response:
You can conditionally mutate the Measurement
column using if_else()
, then pivot_longer()
, and then pivot_wider()
library(dplyr)
library(tidyr)
JanuaryData %>%
mutate(Measurement = if_else(Measurement=="Available Capacity", "Capability",Measurement)) %>%
pivot_longer(starts_with("Hour"), names_prefix = "Hour.", names_to = "Hour") %>%
pivot_wider(names_from="Measurement", values_from="value")
Output:
Delivery.Date Generator Fuel.Type Hour Capability Output Forecast
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 2020-01-01 ABKENORA Hydro 1 13 13 NA
2 2020-01-01 ABKENORA Hydro 7 13 13 NA
3 2020-01-01 ABKENORA Hydro 13 13 13 NA
4 2020-01-01 ABKENORA Hydro 18 13 13 NA
5 2020-01-01 ABKENORA Hydro 24 13 13 NA
6 2020-01-01 ADELAIDE Wind 1 60 1 28
7 2020-01-01 ADELAIDE Wind 7 60 0 32
8 2020-01-01 ADELAIDE Wind 13 60 25 26
9 2020-01-01 ADELAIDE Wind 18 60 43 39
10 2020-01-01 ADELAIDE Wind 24 60 57 57
Explanation:
- use
if_else()
to changeMeasurement
values to "Capability" if the value ofMeasurement
is "Available Capacity" - pivot your data into long format to get all the
Hour.<x>
columns into a single column; note the use ofnames_prefix="Hour."
here to remove this prefix from each of the columns; you could additionally add the optionnames_transform = as.numeric
to thepivot_longer()
call in order to transform the newHour
column from character to numeric - pivot your data back to wide format, getting the names of the new columns from the "Measurement" column