Home > Software engineering >  What is the safest way to iterate through a panda data frame that has rows for the day broken into m
What is the safest way to iterate through a panda data frame that has rows for the day broken into m

Time:12-05

I have a .csv that I grabbed online here: https://marketplace.spp.org/file-browser-api/download/generation-mix-historical?path=/GenMix_2017.csv

The first column is date/time and is broken into 5 minute intervals (military time). I need to ensure that the dates are only for '2017' as there is some data at the end of the .csv from 2018. I want to be able to capture all the data but only in one hour increments.

For example in that .csv, this would be:

2017-01-01T06:00:00Z to 2017-01-01T06:55:00Z which is 12 rows.

This is only the case for 2017-01-01 that it starts at time: 6:00:00 all others start at 0:00:00

I was thinking that I might be able to just iterate ONLY for '2017' data by 12 step increments to get the hour blocks of time, and then once it has run 12*24 times it resets, not sure how to do this.

But also not sure if that would be a good idea it terms of future use cases, it may be that the times change or that there are missing data. Trying to ensure that this won't break if it is used in a few years. It's probably safe to say the company producing this data will continue to produce it the same way, but I guess you never know.

Here is what I have so far:

# puts api call into a pandas dataframe
energy_data = pd.read_csv('https://marketplace.spp.org/file-browser-api/download/generation-mix-historical?path=/GenMix_2017.csv')
    
# puts the date/time field into proper float64 format
energy_data['GMT MKT Interval'] = pd.to_datetime(energy_data['GMT MKT Interval'])
    
# ensures that the entire dataframe can be treated as time series data 
energy_data.set_index('GMT MKT Interval', inplace = True)

CodePudding user response:

Use resample_sum:

df = pd.read_csv('GenMix_2017.csv', parse_dates=['GMT MKT Interval'], 
                 index_col='GMT MKT Interval')

out = df.resample('H').sum()

Output:

>>> out
                            Coal Market   Coal Self   Diesel Fuel Oil    Hydro   Natural Gas  ...   Waste Disposal Services     Wind   Waste Heat   Other   Average Actual Load
GMT MKT Interval                                                                              ...                                                                              
2017-01-01 06:00:00 00:00       34104.7    159041.4               0.0   3220.5       35138.3  ...                     113.8  57517.0            0   431.3            303688.602
2017-01-01 07:00:00 00:00       32215.4    156570.6               0.0   3326.3       33545.2  ...                     132.9  63397.0            0   422.9            304163.427
2017-01-01 08:00:00 00:00       29604.7    152379.6               0.0   3246.0       33851.4  ...                     133.2  64230.5            0   358.1            300871.117
2017-01-01 09:00:00 00:00       28495.9    149474.0               0.0   2973.1       35171.5  ...                     131.9  65860.7            0   344.5            298908.514
2017-01-01 10:00:00 00:00       29304.8    146561.1               0.0   3161.2       34315.4  ...                     133.8  67882.8            0   340.9            299825.531
...                                 ...         ...               ...      ...           ...  ...                       ...      ...          ...     ...                   ...
2018-01-01 01:00:00 00:00       36071.3    216336.8              55.2  16093.1       93466.6  ...                     140.4  75547.5            0   327.6            463542.027
2018-01-01 02:00:00 00:00       35339.9    213596.9              55.2  14378.4       97397.7  ...                     114.6  75277.5            0   325.4            459252.079
2018-01-01 03:00:00 00:00       35051.4    217333.2              55.2  12334.3       96351.1  ...                     107.3  69376.7            0   328.1            453214.866
2018-01-01 04:00:00 00:00       35220.7    220868.9              53.2   8520.8       98404.2  ...                     116.9  60699.7            0   328.5            446139.366
2018-01-01 05:00:00 00:00       35392.1    223590.8              52.2   8980.9      103893.6  ...                     131.1  48453.0            0   329.8            439107.888

[8760 rows x 12 columns]
  • Related