This is the data frame I have:
This is how I need it to be:
This is what .groupby(pd.Grouper(freq='5min')).agg({"open": "first", "high": "max", "low": "min", "close": "last", "volume": "sum"})
gives me:
- I understand that data with 5-minute intervals can instead be procured, but that would defeat the purpose of me trying to get a little better at python.
- I intend to replicate this procedure at (10, 15, 30)-minute timeframes on the same data.
- This should provide a similar data frame, only with different values:
import pandas as pd
import yfinance as yf
import datetime as dt
data = yf.download("INFY.NS", end = dt.datetime.today(), start = dt.datetime.today() - dt.timedelta(days=1), interval="1m")
data.index = pd.to_datetime(data.index)
data = data.between_time('09:31','15:30')
Directives on ways to paste python data frames are welcome.
EDIT: Adding replicable code and some clarity (and a solve!)
- I intend to group the dataframe below to 5-min frequencies:
import pandas as pd
import datetime as dt
date = pd.to_datetime(['9:31', '9:32', '9:33', '9:34', '9:35', '9:36', '9:37', '9:38', '9:39', '9:40'])
a = pd.DataFrame({'Open': [1,2,3,4,5,6,7,8,9,10],
'High': [1,2,3,4,5,6,7,8,9,10],
'Low': [0,1,2,3,4,5,6,7,8,9],
'Close': [0,1,2,3,4,5,6,7,8,9],
'Volume': [10,11,12,13,14,15,16,17,18,19]}, index = date)
- I tried to use pandas.Grouper as below, but it would aggregate data to a different bucket:
a.groupby(pd.Grouper(freq='5min')).agg({"Open": "first", "High": "max", "Low": "min", "Close": "last", "Volume": "sum"})
With this, data indexed [09:31 - 09:34] is grouped/aggregated to a single row indexed [09:30]; however, I need data indexed [09:31-09:35] to be grouped to [09:35]
A solution I found, thanks to @Daniel Weigel and pandas documentation:
b = a.groupby(pd.Grouper(freq='5min',origin='start',label='right')).agg({"Open": "first", "High": "max", "Low": "min", "Close": "last", "Volume": "sum"})
b.index = pd.Timedelta(minutes=-1)
Welcome any suggestions/modifications to improve the solution! :)
CodePudding user response:
I'm not sure I completely understand your issue, but why dont you change the index timestamp after your grouper operation ? Something like
tmp.index=tmp.index.floor(freq='5min') pd.Timedelta(minutes=5)
So you make sure to floor your existing timestamp to the nearest 5 minutes of the bottom, then add 5 minutes to that..and same method for other frequencies. So with your existing code, that is
tmp=data.groupby(pd.Grouper(freq='5min')).agg({"Open": "first", "High": "max", "Low": "min", "Close": "last", "Volume": "sum"})
tmp.index=tmp.index.floor(freq='5min') pd.Timedelta(minutes=5)
tmp