Home > Net >  Converting 1-minute market data into 5-minute data
Converting 1-minute market data into 5-minute data

Time:05-10

This is the data frame I have:

1

This is how I need it to be:

2

This is what .groupby(pd.Grouper(freq='5min')).agg({"open": "first", "high": "max", "low": "min", "close": "last", "volume": "sum"}) gives me:

3

  1. I understand that data with 5-minute intervals can instead be procured, but that would defeat the purpose of me trying to get a little better at python.
  2. I intend to replicate this procedure at (10, 15, 30)-minute timeframes on the same data.
  3. This should provide a similar data frame, only with different values:
import pandas as pd
import yfinance as yf
import datetime as dt
data = yf.download("INFY.NS", end = dt.datetime.today(), start = dt.datetime.today() - dt.timedelta(days=1), interval="1m")
data.index = pd.to_datetime(data.index)
data = data.between_time('09:31','15:30')

Directives on ways to paste python data frames are welcome.

EDIT: Adding replicable code and some clarity (and a solve!)

  1. I intend to group the dataframe below to 5-min frequencies:
import pandas as pd
import datetime as dt
date = pd.to_datetime(['9:31', '9:32', '9:33', '9:34', '9:35', '9:36', '9:37', '9:38', '9:39', '9:40'])
a = pd.DataFrame({'Open': [1,2,3,4,5,6,7,8,9,10],
                  'High': [1,2,3,4,5,6,7,8,9,10],
                  'Low': [0,1,2,3,4,5,6,7,8,9],
                  'Close': [0,1,2,3,4,5,6,7,8,9],
                  'Volume': [10,11,12,13,14,15,16,17,18,19]}, index  = date)
  1. I tried to use pandas.Grouper as below, but it would aggregate data to a different bucket:
a.groupby(pd.Grouper(freq='5min')).agg({"Open": "first", "High": "max", "Low": "min", "Close": "last", "Volume": "sum"})
  1. With this, data indexed [09:31 - 09:34] is grouped/aggregated to a single row indexed [09:30]; however, I need data indexed [09:31-09:35] to be grouped to [09:35]

  2. A solution I found, thanks to @Daniel Weigel and pandas documentation:

b = a.groupby(pd.Grouper(freq='5min',origin='start',label='right')).agg({"Open": "first", "High": "max", "Low": "min", "Close": "last", "Volume": "sum"})
b.index  = pd.Timedelta(minutes=-1)

Welcome any suggestions/modifications to improve the solution! :)

CodePudding user response:

I'm not sure I completely understand your issue, but why dont you change the index timestamp after your grouper operation ? Something like

tmp.index=tmp.index.floor(freq='5min') pd.Timedelta(minutes=5)

So you make sure to floor your existing timestamp to the nearest 5 minutes of the bottom, then add 5 minutes to that..and same method for other frequencies. So with your existing code, that is

tmp=data.groupby(pd.Grouper(freq='5min')).agg({"Open": "first", "High": "max", "Low": "min", "Close": "last", "Volume": "sum"})

tmp.index=tmp.index.floor(freq='5min') pd.Timedelta(minutes=5)
tmp
  • Related