Assume a dataframe as follows. I'm looking to add a column to the df dataframe that takes the price for current row, and subtracts it from the price at the last index 5 minutes prior to the current hour/minute. I've attempted to reference a minute_df and read the current hour/minute and pull the close price from the minute_df, but have not got a working solution. The df index is datetime64.
For example, at 06:27:12, it should be taking this rows price, minus the close price at the last index from the 06:22, as this is 5 minutes prior to 06:27. For each index within the minute 06:27, it should be referencing this close price for the calculation, until it turns to 06:28, then should be subtracting from last index at 06:23.
df
TimeStamp Price Q hour min
2022-10-05 05:30:11.344618-05:00 8636 1 5 30
2022-10-05 05:30:12.647597-05:00 8637 1 5 30
2022-10-05 05:30:20.080559-05:00 8637 1 5 30
2022-10-05 05:30:21.267389-05:00 8637 2 5 30
2022-10-05 05:30:21.267952-05:00 8636 1 5 30
minute_df
TimeStamp open high low close
2022-10-05 05:30:00-05:00 8636 8645 8635 8645
2022-10-05 05:31:00-05:00 8645 8647 8637 8638
2022-10-05 05:32:00-05:00 8639 8650 8639 8649
2022-10-05 05:33:00-05:00 8648 8652 8648 8649
Expected output is a column within the df dataframe containing value of the current price - closing price, or the price at the last index 5 minutes prior to current minute. NaN values up until there is sufficient rows to lookback this many periods.
df['price_change']
CodePudding user response:
Not sure if I understand correctly but here's my try
If TimeStamp
is a column
# Remove the seconds and microseconds
floor_ts = df.TimeStamp.dt.floor("min")
# Get last 5 minute timestamp
last_index_5_ts = floor_ts - pd.Timedelta(5, unit="min")
# Create dict from minute_df TimeStamp to close price
ts_to_close_dict = dict(zip(minute_df.TimeStamp, minute_df.close))
close_price_v = last_index_5_ts.map(ts_to_close_dict)
df["price_change"] = df.Price - close_price_v
df
Same code but if TimeStamp
is an index
floor_ts = df.index.floor("min")
last_index_5_ts = floor_ts - pd.Timedelta(5, unit="min")
ts_to_close_dict = dict(zip(minute_df.index, minute_df.close))
close_price_v = last_index_5_ts.map(ts_to_close_dict)
df["price_change"] = df.Price - close_price_v
df
Few notes:
- I'm not sure what you're meaning about handling
NaN
values but if you need forward fill / backward fill them you can use pd.fillna - Some of the
pandas
function (likefloor
) above might be missing in older pandas version
EDIT:
I didn't notice the df
already have hour
and minute
column. You may use it for calculating floor_ts
(though not sure if it's easier/faster)