Home > Software design >  Pandas time since last trade
Pandas time since last trade

Time:09-13

I have a pandas DataFrame containing symbols and times at which a transaction took place. There’s also a third column indicating whether the transaction was a purchase or a sale. For example,

pd.DataFrame([“AAPL”, “AAPL”, “GOOG”, “AAPL”, “GOOG”], [1, 2, 3, 6, 7], [“BUY”, “SELL”, “BUY”, “BUY”, “BUY”])

I would like to add a third column representing the time since the previous transaction on the same symbol regardless of type. In the example above, this column would be

[NaN, 1, NaN, 4, 4]

The first entry of each stock is always NaN because there’s no transaction on the same symbol before it. The second entry happens to be 1 because the last time AAPL traded was 1 and 2 - 1 = 1 is the difference between the current time and the last trade time.

I’d also like to have a fourth column indicating whether the last transaction was a BUY or a SELL. Again in this example, we’d have

[NaN, BUY, NaN, SELL, BUY]

as the fourth column. Can someone please help me figure out how to do this?

I think I need to do a merge_asof to match the time to the smallest time before it. But since I want it to be independently computed for each symbol, perhaps I also need a groupby? I’m struggling to get the correct logic. Thanks

CodePudding user response:

Assuming this input and sorted "times":

   col1  col2  col3
0  AAPL     1   BUY
1  AAPL     2  SELL
2  GOOG     3   BUY
3  AAPL     6   BUY
4  GOOG     7   BUY

You can use groupby.shift/diff:

g = df.groupby('col1')
df['col4'] = g['col3'].shift()
df['col5'] = g['col2'].diff()

Output:

   col1  col2  col3  col4  col5
0  AAPL     1   BUY   NaN   NaN
1  AAPL     2  SELL   BUY   1.0
2  GOOG     3   BUY   NaN   NaN
3  AAPL     6   BUY  SELL   4.0
4  GOOG     7   BUY   BUY   4.0
  • Related