This is my first time posting so excuse me if my post isn't the best...
I previously looked at this to combine my 5 minute intervals into hourly intervals and show the volumes per every hour. Combine 5 minute timeframes into hourly.
Here is a snippet of the CSV data I am working with. The delimiter is a comma. Data goes all the way from June 20117 through to the end of 2021.
datestamp | timestamp | det_vol |
---|---|---|
2017-06-01 | 13:45 | 582 |
2017-06-01 | 13:50 | 592 |
2017-06-01 | 13:55 | 578 |
2017-06-01 | 14:00 | 544 |
2017-06-01 | 14:05 | 521 |
I read the data in using pd.read_csv()
and sorted through the values to obtain all the volume readings from the month of August for 2017
vds2 = vds2[(vds2['datestamp'].dt.year == 2017) & (vds2['datestamp'].dt.month == 8)]
vds2 = vds2.rename(columns={'det_vol': 'Volume'})
vds2['Date'] = pd.to_datetime(vds2['datestamp']).dt.date
vds2['Time'] = pd.to_datetime(vds2['timestamp']).dt.time
vds2 = vds2[['Date','Time','Volume']]
vds2 = vds2.reset_index(drop=True)
This got me to my final dataframe:
Date Time Volume
0 2017-08-01 00:00 57
1 2017-08-01 00:05 97
2 2017-08-01 00:10 79
3 2017-08-01 00:15 63
4 2017-08-01 00:20 87
... ... ... ...
However when I try to implement the following I get an error on the first line
vds2.index = pd.to_datetime(vds2['Date'] ' ' vds2['Time'])
vds2 = vds2.resample('H').agg({'Volume':'sum'})
TypeError: unsupported operand type(s) for : 'DatetimeArray' and 'str'
I thought maybe the issue was the format of the Date
and Time
whic is why I tried implementing
vds2['Date'] = pd.to_datetime(vds2['Date']).dt.date
vds2['Time'] = pd.to_datetime(vds2['Time']).dt.time
But I still got a similar error...
TypeError: unsupported operand type(s) for : 'datetime.date' and 'str'
I'm sure this is quite simple but would appreciate any help. Thanks!
CodePudding user response:
>>> df.index = pd.to_datetime(df['datestamp'] df['timestamp'])
>>> df.resample('H').det_vol.sum()
2017-06-01 13:00:00 1752
2017-06-01 14:00:00 1065
Freq: H, Name: det_vol, dtype: int64
You need to have the datetime in your index for resample to work.
CodePudding user response:
After testing out you additional code and creating a small "CSV" file out of your sample, I got the same unwanted error. I am not certain, but in searching what the appeared to be happening is that the following line of code was storing your timestamp as a string.
vds2['Time'] = pd.to_datetime(vds2['timestamp']).dt.time
The subsequent call to the "pd.to_datetime" function was falling over.
When I had been trying out various scenarios and getting output, I was not converting the date stamp or timestamp fields. I just left them as is. So what I did was just rename the columns after importing the data from the "CSV" file. Following is my sample code, based upon your program but not doing any data element conversion. I am just renaming the columns.
import pandas as pd
vds2=pd.read_csv('DateTime.csv')
print(vds2)
#vds2 = vds2[(vds2['datestamp'].dt.year == 2017) & (vds2['datestamp'].dt.month == 8)]
vds2 = vds2.rename(columns={'det_vol': 'Volume'})
vds2 = vds2.rename(columns={'datestamp': 'Date'})
vds2 = vds2.rename(columns={'timestamp': 'Time'})
vds2 = vds2[['Date','Time','Volume']]
vds2 = vds2.reset_index(drop=True)
vds2.index = pd.to_datetime(vds2['Date'] ' ' vds2['Time'])
vds2.index.name = 'Date and Time'
vds2 = vds2.resample('H').agg({'Volume':'sum'})
rslt_df = vds2[vds2['Volume'] > 0]
print(rslt_df)
Running this program over the sample data resulted in the following output on my terminal.
@Una:~/Python_Programs/DateTime$ python3 DateTime.py
index datestamp timestamp det_vol
0 0 6/1/2017 13:45 582
1 1 6/1/2017 13:50 592
2 2 6/1/2017 13:55 578
3 3 6/1/2017 14:00 544
4 4 6/1/2017 14:05 521
Volume
Date and Time
2017-06-01 13:00:00 1752
2017-06-01 14:00:00 1065
Perhaps someone above my pay grade can better explain what was happening when your program attempted to convert the time data from your data frame to a timestamp, but it looks like you don't need to even go through the datetime function at that point. It looks like you can just rename the columns to your liking and continue summarizing and processing.
Hope that helps.
Regards.