Home > other >  Pandas find last row for each hour / minute in high frequency dataframe
Pandas find last row for each hour / minute in high frequency dataframe

Time:12-18

Assume a dataframe as follows. I'm looking to add a column to the df dataframe that takes the price for current row, and subtracts it from the price at the last index 5 minutes prior to the current hour/minute. I've attempted to reference a minute_df and read the current hour/minute and pull the close price from the minute_df, but have not got a working solution. The df index is datetime64.

For example, at 06:27:12, it should be taking this rows price, minus the close price at the last index from the 06:22, as this is 5 minutes prior to 06:27. For each index within the minute 06:27, it should be referencing this close price for the calculation, until it turns to 06:28, then should be subtracting from last index at 06:23.

df

TimeStamp                           Price   Q  hour min
2022-10-05 05:30:11.344618-05:00    8636    1   5   30
2022-10-05 05:30:12.647597-05:00    8637    1   5   30
2022-10-05 05:30:20.080559-05:00    8637    1   5   30
2022-10-05 05:30:21.267389-05:00    8637    2   5   30
2022-10-05 05:30:21.267952-05:00    8636    1   5   30

minute_df

TimeStamp                   open    high    low     close
2022-10-05 05:30:00-05:00   8636    8645    8635    8645
2022-10-05 05:31:00-05:00   8645    8647    8637    8638
2022-10-05 05:32:00-05:00   8639    8650    8639    8649
2022-10-05 05:33:00-05:00   8648    8652    8648    8649

Expected output is a column within the df dataframe containing value of the current price - closing price, or the price at the last index 5 minutes prior to current minute. NaN values up until there is sufficient rows to lookback this many periods.

df['price_change']

CodePudding user response:

Not sure if I understand correctly but here's my try

If TimeStamp is a column

# Remove the seconds and microseconds
floor_ts = df.TimeStamp.dt.floor("min")

# Get last 5 minute timestamp
last_index_5_ts = floor_ts - pd.Timedelta(5, unit="min")

# Create dict from minute_df TimeStamp to close price
ts_to_close_dict = dict(zip(minute_df.TimeStamp, minute_df.close))

close_price_v = last_index_5_ts.map(ts_to_close_dict)
df["price_change"] = df.Price - close_price_v
df

Same code but if TimeStamp is an index

floor_ts = df.index.floor("min")
last_index_5_ts = floor_ts - pd.Timedelta(5, unit="min")
ts_to_close_dict = dict(zip(minute_df.index, minute_df.close))

close_price_v = last_index_5_ts.map(ts_to_close_dict)
df["price_change"] = df.Price - close_price_v
df

Few notes:

  • I'm not sure what you're meaning about handling NaN values but if you need forward fill / backward fill them you can use pd.fillna
  • Some of the pandas function (like floor) above might be missing in older pandas version

EDIT: I didn't notice the df already have hour and minute column. You may use it for calculating floor_ts (though not sure if it's easier/faster)

  • Related