Home > Enterprise >  How to substact each other in a dictionary of dataframes in Python
How to substact each other in a dictionary of dataframes in Python

Time:11-18

I have a dictionary of dataframes, looks like {test1:df1, test2:df2, test3:df3,...}.

In each dataframe, I have a column named "Successful Rate".

If I print it out, it looks like this:

'test 1'

Symbol Successful Rate
0001.A 0.4
0002.B 0.5
0003.C 0.78

'test 2'

Symbol Successful Rate
0002.A 0.66
0002.G 0.88
0005.C 0.99

'test 3'

Symbol Successful Rate
0034.EF 0.3
0001.A 0.9
0005.C 0.7

We don't know how many dataframes will there be in the dictionary(The # of dataframes are dynamic).

In each dataframe, the number of rows are might be different(some have 20,000 rows and others might have 20,007 rows), but they must have lots of same Symbols.

Now I want to create new dataframes, comparing the difference of successful rates between different tests, the ideal results might look like this:

'test 1 & test 2'

Symbol Successful Rate Differnece
0001.A 0.1
0002.B 0.2
0003.C 0.03

'test 2 & test 3'

Symbol Successful Rate Difference
0002.A 0.09
0002.G 0.22
0005.C 0.13

'test 3 & test 1'

Symbol Successful Rate Difference
0034.EF 0.04
0001.A 0.04
0005.C 0.03

CodePudding user response:

import pandas as pd
import itertools

# mimic the input data
test1 = pd.DataFrame({'Symbol': ['A', 'B', 'C'], 'Rate': [0.1, 0.2, 0.3]})
test2 = pd.DataFrame({'Symbol': ['B', 'C', 'D'], 'Rate': [0.4, 0.6, 0.8]})
test3 = pd.DataFrame({'Symbol': ['C', 'D', 'E'], 'Rate': [0.5, 0.6, 0.8]})
df_collection = {'test1': test1, 'test2': test2, 'test3': test3}

# shuffle two elements from a list, put into a generator
coms = itertools.combinations(df_collection.keys(), 2)

results = {}

# iterate the generator, do the comparison, and store the output in results.
for x, y in coms:
    df_x = df_collection[x]  
    df_y = df_collection[y]
    df_merge = df_x.merge(df_y, on='Symbol', how='inner' ,indicator=False)
    df_merge['Diff'] = df_merge.apply(lambda row: row[f'Rate_x'] - row['Rate_y'], axis=1)
    df_merge.drop(['Rate_x', 'Rate_y'], axis=1, inplace=True)
    results[f'{x}-{y}'] = df_merge

for k, v in results.items():
    print(k)
    print(v)

Output

test1-test2
  Symbol  Diff
0      B  -0.2
1      C  -0.3
test1-test3
  Symbol  Diff
0      C  -0.2
test2-test3
  Symbol  Diff
0      C   0.1
1      D   0.2
  • Related