Home > Mobile >  Pandas: How to subtract value from columns by rows from different dataframes
Pandas: How to subtract value from columns by rows from different dataframes

Time:11-25

Using Pandas data frames, how do I subtract to find the differences in columns '(x$1000)' and 'PRN AMT' between data frames based on 'CUSIP'(which acts as a unique id)? The dataset I provided is a sample, so the solution must be able to contend with a different order. I've tried reading documentation on dataframe.subtract() and don't understand how to apply it to my specific problem.

data frame 1:

CUSIP       (x$1000)     PRN AMT   TICKER                            
594918104  2765345852  2114080582   MSFT
037833100  1891440177  3058252946   AAPL
02079K305  1721936077   132543866  GOOGL
023135106  1341784239  2573051329   AMZN

data frame 2:

CUSIP       (x$1000)     PRN AMT   TICKER  
594918104  3034828140  1612323669   MSFT
037833100  2463247977  2628732382   AAPL
02079K305  2096049986    93429916  GOOGL
023135106  1581124222   118724459   AMZN

Wanted output:

CUSIP       (x$1000)     PRN AMT   TICKER  
594918104  -269482288  501756913   MSFT
037833100  -571807800  429520564   AAPL
02079K305  -374113909   39113950   GOOGL
023135106  -239339983  2454326870   AMZN

Here is the code to recreate the dataframes:

import pandas as pd
dataset_1 = {'CUSIP': ['594918104', '037833100', '02079K305', '023135106'], '(x$1000)': [
    2765345852, 1891440177, 1721936077, 1341784239], 'PRN AMT': [2114080582, 3058252946, 132543866, 2573051329], 'TICKER': ['MSFT', 'AAPL', 'GOOGL', 'AMZN']}
dataset_2 = {'CUSIP': ['594918104', '037833100', '02079K305', '023135106'], '(x$1000)': [
    3034828140, 2463247977, 2096049986, 1581124222], 'PRN AMT': [1612323669, 2628732382, 93429916, 118724459], 'TICKER': ['MSFT', 'AAPL', 'GOOGL', 'AMZN']}
df_1 = pd.DataFrame(data=dataset_1)
df_2 = pd.DataFrame(data=dataset_2)
print(f'{df_1} and {df_2}')

CodePudding user response:

Create MultiIndex by CUSIP,TICKER and subtract by DataFrame.sub, last DataFrame.reset_index and change order of columns by DataFrame.reindex:

df = (df_1.set_index(['CUSIP','TICKER'])
          .sub(df_2.set_index(['CUSIP','TICKER']))
          .reset_index()
          .reindex(df_1.columns, axis=1))
print (df)

       CUSIP   (x$1000)     PRN AMT TICKER
0  594918104 -269482288   501756913   MSFT
1  037833100 -571807800   429520564   AAPL
2  02079K305 -374113909    39113950  GOOGL
3  023135106 -239339983  2454326870   AMZN
  • Related