Tried to keep this relatively simple but let me know if you need more information.
I have 2 dictionaries made up of three dataframes each, these have been produced through loops then added into a dictionary. They have the keys ['XAUUSD', 'EURUSD', 'GBPUSD'] in common:
trades_dict
{'XAUUSD': df_trades_1
'EURUSD': df_trades_2
'GBPUSD': df_trades_3}
prices_dict
{'XAUUSD': df_prices_1
'EURUSD': df_prices_2
'GBPUSD': df_prices_3}
I would like to merge the tables on the closest timestamps to produce 3 new dataframes such that the XAUUSD trades dataframe is merged with the corresponding XAUUSD prices dataframe and so on
I have been able to join the dataframes in a loop using:
df_merge_list = []
for trades in trades_dict.values():
for prices in prices_dict.values():
df_merge = pd.merge_asof(trades, prices, left_on='transact_time', right_on='time', direction='backward')
df_merge_list.append(df_merge)
However this produces a list of 9 dataframes, XAUUSD trades XAUUSD price, XAUUSD trades EURUSD price and XAUUSD trades GBPUSD price etc.
Is there a way for me to join only the dataframes where the keys are identical? I'm assuming it will need to be something like this: if trades_dict.keys() == prices_dict.keys():
df_merge_list = []
for trades in trades_dict.values():
for prices in prices_dict.values():
if trades_dict.keys() == prices_dict.keys():
df_merge = pd.merge_asof(trades, prices, left_on='transact_time', right_on='time', direction='backward')
df_merge_list.append(df_merge)
but I'm getting the same result as above
Am I close? How can I do this for all instruments and only produce the 3 outputs I need? Any help is appreciated
Thanks in advance
CodePudding user response:
You need to provide the exact dataframes with the correct column names in a reproducible form but you can use a dictionary like this:
import numpy as np
import pandas as pd
np.random.seed(42)
df_trades_1 = df_trades_2 = df_trades_3 = pd.DataFrame(np.random.rand(10, 2), columns = ['ID1', 'Val1'])
df_prices_1 = df_prices_2 = df_prices_3 = pd.DataFrame(np.random.rand(10, 2), columns = ['ID2', 'Val2'])
trades_dict = {'XAUUSD':df_trades_1, 'EURUSD':df_trades_2, 'GBPUSD':df_trades_3}
prices_dict = {'XAUUSD':df_prices_1, 'EURUSD':df_prices_2, 'GBPUSD':df_prices_3}
frames ={}
for t in trades_dict.keys():
frames[t] = (pd.concat([trades_dict[t], prices_dict[t]], axis = 1))
frames['XAUUSD']
This would concatenate the two dataframes, making them both available under the same key:
ID1 Val1 ID2 Val2
0 0.374540 0.950714 0.611853 0.139494
1 0.731994 0.598658 0.292145 0.366362
2 0.156019 0.155995 0.456070 0.785176
3 0.058084 0.866176 0.199674 0.514234
4 0.601115 0.708073 0.592415 0.046450
5 0.020584 0.969910 0.607545 0.170524
6 0.832443 0.212339 0.065052 0.948886
7 0.181825 0.183405 0.965632 0.808397
8 0.304242 0.524756 0.304614 0.097672
9 0.431945 0.291229 0.684233 0.440152
You may need some error checking in case your keys don't match or the kind of join (left, right, inner etc.) depending upon your columns but that's the gist of it.
CodePudding user response:
"""
Pseudocode :
For each key in the list of keys in trades_dict :
Pick that key's value (trades df) from trades_dict
Using the same key, pick corresponding value (prices df) from prices_dict
Merge both values (trades & prices dataframes)
"""
df_merge_list = []
for key in trades_dict.keys():
trades = trades_dict[key]
prices = prices_dict[key] # using the same key to get corresponding prices
df_merge = pd.merge_asof(trades, prices, left_on='transact_time', right_on='time', direction='backward')
df_merge_list.append(df_merge)
What went wrong in code posted in question?
Nested for loop creates cartesian product
3 iterations in outer loop multiplied by 3 iterations in inner loop = 9 iterations
Result of
trades_dict.keys() == prices_dict.keys()
isTrue
in all 9 iterationsdict_a_all_keys == dict_b_all_keys
is not same asdict_a_key_1 == dict_b_key_1
. So, you could iterate through keys of dictionary and check if they are matching in nested loop, like this :
df_merge_list = []
for trades_key in trades_dict.keys():
for prices_key in prices_dict.keys():
if trades_key == prices_key:
trades = trades_dict[trades_key]
prices = prices_dict[trades_key] # since trades_key is same as prices_key, they are interchangeable
df_merge = pd.merge_asof(trades, prices, left_on='transact_time', right_on='time', direction='backward')
df_merge_list.append(df_merge)