have a dataframe with 1 minute timestamp of open, high, low, close, volume for a token. using expanding or resample function, one can get a new dataframe based on the timeinterval. in my case its 1 day time interval.
i am looking to get the above output in the original dataframe. please assist in the same.
original dataframe:
desired dataframe:
Here "date_1d" is the time interval for my use case. i used expanding function but as the value changes in "date_1d" column, expanding function works on the whole dataframe
''' df["high_1d"] = df["high"].expanding().max()
df["low_1d"] = df["low"].expanding().min()
df["volume_1d"] = df["volume"].expanding().min() '''
then the next challenge was how to find Open and Close based on "date_1d" column
Please assist or ask more questions, if not clear on my desired output.
Fyi - data is huge for 5 years 1 minute data for 100 tokens
thanks in advance
Sukhwant
CodePudding user response:
I'm not sure if I understand it right but for me it looks like you want to groupby
each day and calculate first
last
min
max
for them.
Is the column date_1d
already there ?
If not:
df["date_1d"] = df["date"].dt.strftime('%Y%m%d')
For the calculations:
df["open_1d"] = df.groupby("date_1d")['open'].transform('first')
df["high_1d"] = df.groupby("date_1d")['high'].transform('max')
df["low_1d"] = df.groupby("date_1d")['low'].transform('min')
df["close_1d"] = df.groupby("date_1d")['close'].transform('last')
EDIT: Have a look in your code if this works like you expect it (till we have some of your data I can only guess, sorry :D )
df['high_1d'] = df.groupby('date_1d')['high'].expanding().max().values
It groups the data per "date_1d" but in the group only consider row by row (and the above rows)