I have 2 dataframes:
- Dataframe 1: Contains a number of unique securities with its current price.
- Dataframe 2: Contains new prices from Bloomberg (hypothetically) within a pricing window for the called securities in df1.
The challenge I am trying to solve is to obtain a new price for each security held in df1 from df2. If the new price for a matching security is within a set tolerance percentage (1%) than that price will be used. Additionally, it needs to take the price with the lowest percentage difference within that pricing window of 10 mins (14:50 to 14:59) for a set date.
Dataframe1:
import pandas as pd
import numpy as np
# Create Internal File
data = [['1',99.434],['2',99.987],['3',98.117],['4',95.557]]
df = pd.DataFrame(data, columns = ['Security_ID', 'OldPrice'])
Output of df1:
Security_ID OldPrice
0 1 99.434
1 2 99.987
2 3 98.117
3 4 95.557
Dataframe2 (Price file from Bloomberg):
data = [['1','10/10/2022 14:50',98.50],
['1','10/10/2022 14:51',98.30],
['1','10/10/2022 14:52',98.00],
['1','10/10/2022 14:53',101.34],
['1','10/10/2022 14:54',98.30],
['1','10/10/2022 14:55',98.60],
['1','10/10/2022 14:56',99.24],
['1','10/10/2022 14:57',99.99],
['1','10/10/2022 14:58',98.40],
['1','10/10/2022 14:59',101.12],
['2','10/10/2022 14:50',101.12],
['2','10/10/2022 14:51',101.12],
['2','10/10/2022 14:52',98.9],
['2','10/10/2022 14:53',98.9],
['2','10/10/2022 14:54',98.9],
['2','10/10/2022 14:55',98.9],
['2','10/10/2022 14:56',98.9],
['2','10/10/2022 14:57',101.12],
['2','10/10/2022 14:58',101.12],
['2','10/10/2022 14:59',101.12],
['3','10/10/2022 14:50',99.6],
['3','10/10/2022 14:51',99.6],
['3','10/10/2022 14:52',99.6],
['3','10/10/2022 14:53',99.6],
['3','10/10/2022 14:54',100],
['3','10/10/2022 14:55',98.8],
['3','10/10/2022 14:56',99.6],
['3','10/10/2022 14:57',99.6],
['3','10/10/2022 14:58',99.6],
['3','10/10/2022 14:59',99.6]]
df2 = pd.DataFrame(data, columns = ['BB_Security_ID', 'BB_Time', 'BB_NewPrice'])
Output of df2 (df2.head(5):
BB_Security_ID BB_Time BB_NewPrice
0 1 10/10/2022 14:50 98.50
1 1 10/10/2022 14:51 98.30
2 1 10/10/2022 14:52 98.00
3 1 10/10/2022 14:53 101.34
4 1 10/10/2022 14:54 98.30
# change time data type
df2['BB_Time'] = pd.to_datetime(df2['BB_Time'], errors='coerce')
df2['BB_Time'] = pd.to_datetime(df2['BB_Time'].dt.strftime('%d/%m/%Y %H:%M:%S'))
df2['BB_Security_ID']=df2['BB_Security_ID'].astype(str)
# Join dataframes on security id
df = pd.merge(df, df2, left_on = ['Security_ID'],right_on = ['BB_Security_ID'], how = 'left')
# function to calculate percentage difference
def percentage_dff(col1,col2):
"""
Calculate percentage difference between 2 columns and return absolute value
"""
return abs(((col1 - col2)/ col2) * 100)
df['PriceDff'] = percentage_dff(df['OldPrice'],df['BB_NewPrice'])
Output of joined dfs with price difference added (df.head(5))
Security_ID OldPrice BB_Security_ID BB_Time BB_NewPrice PriceDff
0 1 99.434 1 2022-10-10 14:50:00 98.50 0.948223
1 1 99.434 1 2022-10-10 14:51:00 98.30 1.153611
2 1 99.434 1 2022-10-10 14:52:00 98.00 1.463265
3 1 99.434 1 2022-10-10 14:53:00 101.34 1.880797
4 1 99.434 1 2022-10-10 14:54:00 98.30 1.153611
Desired Outcome: The following output is the expected outcome I am looking to get to that I haven't figured out yet. Any help will be much appreciated.
Security_ID OldPrice BB_Time BB_NewPrice PriceDff WithinTolerence
0 1 99.434 2022-10-10 14:57:00 99.24 0.195486 Y
1 2 99.987 NaN NaN NaN N
2 3 98.117 2022-10-10 14:55:00 98.80 0.691296 Y
3 4 95.557 NaN NaN NaN NoPriceFound
The system can accept the new price if its within 1% tolerance window otherwise no price is accepted for downstream use. If there is a price under the tolerance set, it needs to find the closest match with the lowest % difference
The expected outcome justification:
- Security ID 1 should accept new price of 99.24 priced at 2022-10-10 14:57:00 as it has the lowest percentage difference, so WithinTolerence set to Y.
- Security ID 2 does not accept any new prices as they are all outside the tolerance window, so WithinTolerence set to N.
- Security ID 3 should accept new price of 98.80 priced at 2022-10-10 14:55:00. It is the only price within the pricing window under set tolerance so WithinTolerence set to Y.
- Security ID 4 - No price available from Bloomberg so the column "WithinTolerence" is labelled as NoPriceFound.
CodePudding user response:
I start at df
after you merge df1
and df2
.
df.sort_values(by=['Security_ID', 'PriceDff'], inplace=True)
out = df.groupby('Security_ID')[['OldPrice', 'BB_NewPrice', 'PriceDff']].first().reset_index().copy()
out['BB_NewPrice'] = np.where(out['PriceDff'] <= 1.0, out['BB_NewPrice'], np.NaN)