import pandas as pd
# first dataframe
A = pd.DataFrame({'clients':[123, 123, 124, 124], 'Product':['stock', 'bonds','stocks','bonds'], 'net':[100, 0, 300, 200]}, index = [0, 1, 2, 4])
clients Product net
0 123 stock 100
1 123 bonds 0
2 124 stocks 300
4 124 bonds 200
# second dataframe
B = pd.DataFrame({'clients':[123, 123, 124, 124], 'Product':['bonds', 'stocks','stocks','bonds'], 'net':[100, 50, 250, 100]}, index = [0, 2, 3, 5])
clients Product net
0 123 bonds 100
2 123 stocks 50
3 124 stocks 250
5 124 bonds 100
I want to subtract ('B - A') only the same product net value of a given client, but I have an index problem as you notice above. I want the output to be:
clients Product net difference
? 123 stocks -50
? 123 bonds 100
? 124 stocks -50
? 124 bonds -100
I used '?' in the indexes, because I don't care about them.
CodePudding user response:
You need to use sort_values
and reset_index
:
A = A.sort_values(['clients', 'Product'])
B = B.sort_values(['clients', 'Product'])
A = A.reset_index(drop=True)
A['net'] = B['net'].reset_index(drop=True) - A['net'].reset_index(drop=True)
Output:
>>> A
clients Product net
0 123 bonds 100
1 123 stocks -50
2 124 bonds -100
3 124 stocks -50
CodePudding user response:
To perform unambiguous operations, you need to align the values. merge
is one option:
(A.merge(B, on=['clients', 'Product'])
.assign(**{'net difference': lambda d: d['net_y']-d['net_x']})
.drop(columns=['net_x', 'net_y'])
)
Alternative, setting a temporary index:
cols = ['clients', 'Product']
((B.set_index(cols)-A.set_index(cols))
.rename({'net': 'net difference'})
.reset_index()
)
Output:
clients Product net difference
0 123 stocks -50
1 123 bonds 100
2 124 stocks -50
3 124 bonds -100