Home > Back-end >  Python: Multiply columns of two DataFrames base on the nearest index
Python: Multiply columns of two DataFrames base on the nearest index

Time:03-10

I have two DataFrames:

import pandas as pd

df1 = pd.DataFrame()
df1['time'] = ['2022-01-01 17:03:32', '2022-01-01 17:04:30', '2022-01-01 17:04:32', '2022-01-02 00:12:02',
        '2022-01-02 11:23:16', '2022-01-02 18:13:30', '2022-01-02 21:23:52', '2022-01-02 19:43:12']
df1['price'] = [1,2,3,4,5,6,7,8]
df1['ticker'] = ['a','b','a','b','c','c','a','e']
df2 = pd.DataFrame()
df2['time'] = ['2022-01-01 17:03:50', '2022-01-01 17:06:52', '2022-01-01 17:07:02', '2022-01-02 00:17:42',
        '2022-01-02 11:18:16', '2022-01-02 18:13:39', '2022-01-02 21:24:12', '2022-01-02 19:43:12']
df2['amount'] = [10,12,13,14,15,16,17,18]
df2['ticker']=['a','b','b','c','d','e','a','c']

df1:
    time                     price  ticker
 0  2022-01-01 17:03:32      1      a
 1  2022-01-01 17:04:30      2      b
 2  2022-01-01 17:04:32      3      a
 3  2022-01-02 00:12:02      4      b
 4  2022-01-02 11:23:16      5      c
 5  2022-01-02 18:13:30      6      c
 6  2022-01-02 21:23:52      7      a
 7  2022-01-02 19:43:12      8      e     

df2:              
    time                     amount  ticker
 0  2022-01-01 17:03:50      10      a
 1  2022-01-01 17:06:52      12      b
 2  2022-01-01 17:07:02      13      b
 3  2022-01-02 00:17:42      14      c
 4  2022-01-02 11:18:16      15      d
 5  2022-01-02 18:13:39      16      e
 6  2022-01-02 21:24:12      17      a
 7  2022-01-02 19:43:12      18      c

What I am trying to do is multiply the "price" column in df1 with the "amount" column of df2 based on the nearest "time" and where df1.ticker=df2.ticker to get something like this:

df:
    time                   balance    ticker
    2022-01-01 17:03:50    10         a
    2022-01-01 17:06:52    24         b
    2022-01-02 21:24:12    119        a
    ....

where df['balance']=df1['price']*df2['amount']

How can this be done pythonically without having several for loops/if statements?

CodePudding user response:

Use merge_asof with direction='nearest' for new DataFrame and then is possible create new column:

df1['time'] = pd.to_datetime(df1['time'])
df2['time'] = pd.to_datetime(df2['time'])

df = pd.merge_asof(df2.sort_values('time'), 
                   df1.sort_values('time'), 
                   on='time', 
                   by='ticker', 
                   direction='nearest')

df['balance'] = df['price']*df['amount']
print (df)
                 time  amount ticker  price  balance
0 2022-01-01 17:03:50      10      a    1.0     10.0
1 2022-01-01 17:06:52      12      b    2.0     24.0
2 2022-01-01 17:07:02      13      b    2.0     26.0
3 2022-01-02 00:17:42      14      c    5.0     70.0
4 2022-01-02 11:18:16      15      d    NaN      NaN
5 2022-01-02 18:13:39      16      e    8.0    128.0
6 2022-01-02 19:43:12      18      c    6.0    108.0
7 2022-01-02 21:24:12      17      a    7.0    119.0

CodePudding user response:

  1. Use compare to get the rows with the same ticker values from df2 and select the data with indexes by using take
tkcmp = df2.ticker.compare(df1.ticker, keep_shape=True)
idx_tk = tkcmp.index[tkcmp.isnull().any(1) == True].tolist()
df = df2.take(idx)
  1. Convert df.time from string to datetime, get the index of nearset time from df1.amount
t1 = pd.to_datetime(df1.time)
idx_price = [(t1-i).apply(lambda x: x.total_seconds()).abs().idxmin() for i in pd.to_datetime(df.time)]

  1. Multiply price by the amount
df['balance'] = df.amount * df1.price.take(idx_price).values

df
                  time  amount ticker  balance
0  2022-01-01 17:03:50      10      a       10
1  2022-01-01 17:06:52      12      b       36
6  2022-01-02 21:24:12      17      a      119
  • Related