I have a dictionary of dataframes, looks like {test1:df1, test2:df2, test3:df3,...}.
In each dataframe, I have a column named "Successful Rate".
If I print it out, it looks like this:
'test 1'
Symbol | Successful Rate |
---|---|
0001.A | 0.4 |
0002.B | 0.5 |
0003.C | 0.78 |
'test 2'
Symbol | Successful Rate |
---|---|
0002.A | 0.66 |
0002.G | 0.88 |
0005.C | 0.99 |
'test 3'
Symbol | Successful Rate |
---|---|
0034.EF | 0.3 |
0001.A | 0.9 |
0005.C | 0.7 |
We don't know how many dataframes will there be in the dictionary(The # of dataframes are dynamic).
In each dataframe, the number of rows are might be different(some have 20,000 rows and others might have 20,007 rows), but they must have lots of same Symbols.
Now I want to create new dataframes, comparing the difference of successful rates between different tests, the ideal results might look like this:
'test 1 & test 2'
Symbol | Successful Rate Differnece |
---|---|
0001.A | 0.1 |
0002.B | 0.2 |
0003.C | 0.03 |
'test 2 & test 3'
Symbol | Successful Rate Difference |
---|---|
0002.A | 0.09 |
0002.G | 0.22 |
0005.C | 0.13 |
'test 3 & test 1'
Symbol | Successful Rate Difference |
---|---|
0034.EF | 0.04 |
0001.A | 0.04 |
0005.C | 0.03 |
CodePudding user response:
import pandas as pd
import itertools
# mimic the input data
test1 = pd.DataFrame({'Symbol': ['A', 'B', 'C'], 'Rate': [0.1, 0.2, 0.3]})
test2 = pd.DataFrame({'Symbol': ['B', 'C', 'D'], 'Rate': [0.4, 0.6, 0.8]})
test3 = pd.DataFrame({'Symbol': ['C', 'D', 'E'], 'Rate': [0.5, 0.6, 0.8]})
df_collection = {'test1': test1, 'test2': test2, 'test3': test3}
# shuffle two elements from a list, put into a generator
coms = itertools.combinations(df_collection.keys(), 2)
results = {}
# iterate the generator, do the comparison, and store the output in results.
for x, y in coms:
df_x = df_collection[x]
df_y = df_collection[y]
df_merge = df_x.merge(df_y, on='Symbol', how='inner' ,indicator=False)
df_merge['Diff'] = df_merge.apply(lambda row: row[f'Rate_x'] - row['Rate_y'], axis=1)
df_merge.drop(['Rate_x', 'Rate_y'], axis=1, inplace=True)
results[f'{x}-{y}'] = df_merge
for k, v in results.items():
print(k)
print(v)
Output
test1-test2
Symbol Diff
0 B -0.2
1 C -0.3
test1-test3
Symbol Diff
0 C -0.2
test2-test3
Symbol Diff
0 C 0.1
1 D 0.2