Before I ask the question, I wish you all a Happy New Year.
I am a quite new to python and pretty much any computational work using computers. A similar question has been asked before but the answer does not seem to work now that is for me at least. I have a dataframe with attributes DateTime, open, high, low and close recorded in 1-minute intervals. How does one convert the 1-minute dataframe into say, a 30-minute dataframe?
An extract of the data is shown below
DateTime Open High Low Close
0 2017-01-02 02:00:00 1.05155 1.05197 1.05155 1.05190
1 2017-01-02 02:01:00 1.05209 1.05209 1.05177 1.05179
2 2017-01-02 02:02:00 1.05177 1.05198 1.05177 1.05178
3 2017-01-02 02:03:00 1.05188 1.05200 1.05188 1.05200
4 2017-01-02 02:04:00 1.05196 1.05204 1.05196 1.05203
meanwhile the desired output is something like
DateTime Open High Low Close
0 2017-01-02 02:00:00 1.05155 1.05197 1.05155 1.05190
1 2017-01-02 02:30:00 -the next 30 minutes-
2 2017-01-02 03:00:00
3 2017-01-02 03:30:00
4 2017-01-02 04:00:00
next day
2017-01-03 00:00:00
The later 30 minute intervals could be aggregated as follows; Open is the first price in the timeframe, high is the highest price in the 30 minute timeframe, low is the lowest price in the 30 minute timeframe, close is the last price before a new timeframe.
I tried to do this in Microsoft excel by using the following formula;
=INT(A3) TIME(HOUR(A3), MINUTE(A3) 30 ; 0) because cell A2 had the DateTime 2017-01-02 02:00:00 and so cell A3 produced 2017-01-02 02:30:00 but when I drag all this down, it is not taking account of the change into a new day, only the time keeps changing correctly. On top of that, I could not find a formula I could use to aggregate the attributes in the way I have described it above.
Performing this in either Microsoft excel or Python will be greatly appreciated.
CodePudding user response:
Pandas has a really convenient function for this: pd.DataFrame.resample
, but make sure that your "DateTime" column is actually in datetime format, not just a string (you may need to use pd.to_datetime
to accomplish that if you haven't already).
Aggregation functions can be found here: https://pandas.pydata.org/docs/reference/resampling.html
Finally:
import pandas as pd
df = pd.read_clipboard() # Your dataframe here
agg_funcs = {
"Open": "first",
"High": "max",
"Low": "min",
"Close": "last"
}
out = df.resample("30T", on="DateTime").agg(agg_funcs)
Valid frequency aliases (and multiples of which) can be found here: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases
PS: You do not need to use the on="DateTime"
keyword argument if you make your "DateTime" column the index, (i.e. a pd.DatetimeIndex
), which tends to be very convenient (and also seems natural for your data) for other tasks you might want to do.