Merge dataframes from two dictionaries through a loop-CodePudding

Tried to keep this relatively simple but let me know if you need more information.

I have 2 dictionaries made up of three dataframes each, these have been produced through loops then added into a dictionary. They have the keys ['XAUUSD', 'EURUSD', 'GBPUSD'] in common:

trades_dict

{'XAUUSD': df_trades_1

'EURUSD': df_trades_2

'GBPUSD': df_trades_3}

prices_dict

{'XAUUSD': df_prices_1

'EURUSD': df_prices_2

'GBPUSD': df_prices_3}

I would like to merge the tables on the closest timestamps to produce 3 new dataframes such that the XAUUSD trades dataframe is merged with the corresponding XAUUSD prices dataframe and so on

I have been able to join the dataframes in a loop using:

df_merge_list = []

for trades in trades_dict.values():
    for prices in prices_dict.values():
        df_merge = pd.merge_asof(trades, prices, left_on='transact_time', right_on='time', direction='backward')
        df_merge_list.append(df_merge)

However this produces a list of 9 dataframes, XAUUSD trades XAUUSD price, XAUUSD trades EURUSD price and XAUUSD trades GBPUSD price etc.

Is there a way for me to join only the dataframes where the keys are identical? I'm assuming it will need to be something like this: if trades_dict.keys() == prices_dict.keys():

df_merge_list = []

for trades in trades_dict.values():
    for prices in prices_dict.values():
        if trades_dict.keys() == prices_dict.keys():
            df_merge = pd.merge_asof(trades, prices, left_on='transact_time', right_on='time', direction='backward')
            df_merge_list.append(df_merge)

but I'm getting the same result as above

Am I close? How can I do this for all instruments and only produce the 3 outputs I need? Any help is appreciated

Thanks in advance

CodePudding user response：

You need to provide the exact dataframes with the correct column names in a reproducible form but you can use a dictionary like this:

import numpy as np
import pandas as pd

np.random.seed(42)
df_trades_1 = df_trades_2 = df_trades_3 = pd.DataFrame(np.random.rand(10, 2), columns = ['ID1', 'Val1'])
df_prices_1 = df_prices_2 = df_prices_3 = pd.DataFrame(np.random.rand(10, 2), columns = ['ID2', 'Val2'])
trades_dict = {'XAUUSD':df_trades_1, 'EURUSD':df_trades_2, 'GBPUSD':df_trades_3}
prices_dict = {'XAUUSD':df_prices_1, 'EURUSD':df_prices_2, 'GBPUSD':df_prices_3}

frames ={}
for t in trades_dict.keys():
    frames[t] = (pd.concat([trades_dict[t], prices_dict[t]], axis = 1))
frames['XAUUSD']

This would concatenate the two dataframes, making them both available under the same key:

        ID1      Val1       ID2      Val2
0  0.374540  0.950714  0.611853  0.139494
1  0.731994  0.598658  0.292145  0.366362
2  0.156019  0.155995  0.456070  0.785176
3  0.058084  0.866176  0.199674  0.514234
4  0.601115  0.708073  0.592415  0.046450
5  0.020584  0.969910  0.607545  0.170524
6  0.832443  0.212339  0.065052  0.948886
7  0.181825  0.183405  0.965632  0.808397
8  0.304242  0.524756  0.304614  0.097672
9  0.431945  0.291229  0.684233  0.440152

You may need some error checking in case your keys don't match or the kind of join (left, right, inner etc.) depending upon your columns but that's the gist of it.

CodePudding user response：

"""
Pseudocode :
For each key in the list of keys in trades_dict :
    Pick that key's value (trades df) from trades_dict
    Using the same key, pick corresponding value (prices df) from prices_dict
    Merge both values (trades & prices dataframes)
"""

df_merge_list = []

for key in trades_dict.keys():
    trades = trades_dict[key]
    prices = prices_dict[key] # using the same key to get corresponding prices

    df_merge = pd.merge_asof(trades, prices, left_on='transact_time', right_on='time', direction='backward')
    df_merge_list.append(df_merge)

What went wrong in code posted in question?

Nested for loop creates cartesian product

3 iterations in outer loop multiplied by 3 iterations in inner loop = 9 iterations
Result of trades_dict.keys() == prices_dict.keys() is True in all 9 iterations

dict_a_all_keys == dict_b_all_keys is not same as dict_a_key_1 == dict_b_key_1. So, you could iterate through keys of dictionary and check if they are matching in nested loop, like this :

df_merge_list = []

for trades_key in trades_dict.keys():
    for prices_key in prices_dict.keys():
        if trades_key == prices_key:
            trades = trades_dict[trades_key]
            prices = prices_dict[trades_key] # since trades_key is same as prices_key, they are interchangeable
            df_merge = pd.merge_asof(trades, prices, left_on='transact_time', right_on='time', direction='backward')
            df_merge_list.append(df_merge)