I have a .csv that I grabbed online here: https://marketplace.spp.org/file-browser-api/download/generation-mix-historical?path=/GenMix_2017.csv
The first column is date/time and is broken into 5 minute intervals (military time). I need to ensure that the dates are only for '2017' as there is some data at the end of the .csv from 2018. I want to be able to capture all the data but only in one hour increments.
For example in that .csv, this would be:
2017-01-01T06:00:00Z
to 2017-01-01T06:55:00Z
which is 12 rows.
This is only the case for 2017-01-01 that it starts at time: 6:00:00
all others start at 0:00:00
I was thinking that I might be able to just iterate ONLY for '2017' data by 12 step increments to get the hour blocks of time, and then once it has run 12*24 times it resets, not sure how to do this.
But also not sure if that would be a good idea it terms of future use cases, it may be that the times change or that there are missing data. Trying to ensure that this won't break if it is used in a few years. It's probably safe to say the company producing this data will continue to produce it the same way, but I guess you never know.
Here is what I have so far:
# puts api call into a pandas dataframe
energy_data = pd.read_csv('https://marketplace.spp.org/file-browser-api/download/generation-mix-historical?path=/GenMix_2017.csv')
# puts the date/time field into proper float64 format
energy_data['GMT MKT Interval'] = pd.to_datetime(energy_data['GMT MKT Interval'])
# ensures that the entire dataframe can be treated as time series data
energy_data.set_index('GMT MKT Interval', inplace = True)
CodePudding user response:
Use resample_sum
:
df = pd.read_csv('GenMix_2017.csv', parse_dates=['GMT MKT Interval'],
index_col='GMT MKT Interval')
out = df.resample('H').sum()
Output:
>>> out
Coal Market Coal Self Diesel Fuel Oil Hydro Natural Gas ... Waste Disposal Services Wind Waste Heat Other Average Actual Load
GMT MKT Interval ...
2017-01-01 06:00:00 00:00 34104.7 159041.4 0.0 3220.5 35138.3 ... 113.8 57517.0 0 431.3 303688.602
2017-01-01 07:00:00 00:00 32215.4 156570.6 0.0 3326.3 33545.2 ... 132.9 63397.0 0 422.9 304163.427
2017-01-01 08:00:00 00:00 29604.7 152379.6 0.0 3246.0 33851.4 ... 133.2 64230.5 0 358.1 300871.117
2017-01-01 09:00:00 00:00 28495.9 149474.0 0.0 2973.1 35171.5 ... 131.9 65860.7 0 344.5 298908.514
2017-01-01 10:00:00 00:00 29304.8 146561.1 0.0 3161.2 34315.4 ... 133.8 67882.8 0 340.9 299825.531
... ... ... ... ... ... ... ... ... ... ... ...
2018-01-01 01:00:00 00:00 36071.3 216336.8 55.2 16093.1 93466.6 ... 140.4 75547.5 0 327.6 463542.027
2018-01-01 02:00:00 00:00 35339.9 213596.9 55.2 14378.4 97397.7 ... 114.6 75277.5 0 325.4 459252.079
2018-01-01 03:00:00 00:00 35051.4 217333.2 55.2 12334.3 96351.1 ... 107.3 69376.7 0 328.1 453214.866
2018-01-01 04:00:00 00:00 35220.7 220868.9 53.2 8520.8 98404.2 ... 116.9 60699.7 0 328.5 446139.366
2018-01-01 05:00:00 00:00 35392.1 223590.8 52.2 8980.9 103893.6 ... 131.1 48453.0 0 329.8 439107.888
[8760 rows x 12 columns]