Home > other >  Subtract dataframes with completely different row names and column names
Subtract dataframes with completely different row names and column names

Time:11-02

My dataframe 1 looks like this:

windcodes name yield perp
163197.SH shangguo comp 2.9248 NO
154563.SH guosheng comp 2.886 Yes
789645.IB guoyou comp 3.418 NO

My dataframe 2 looks like this

windcodes CALC
1202203.IB 2.5517
1202203.IB 2.48457
1202203.IB 2.62296

and I want my result dataframe 3 to have one more new column than dataframe 1 which is to use the value in column 'yield' in dataframe 1 subtract the value in column 'CALC' in dataframe 2: The result dataframe 3 should be looking like this

windcodes name yield perp yield-CALC
163197.SH shangguo comp 2.9248 NO 0.3731
154563.SH guosheng comp 2.886 Yes 0.40413
789645.IB guoyou comp 3.418 NO 0.79504

It would be really helpful if anyone can tell me how to do it in python.

CodePudding user response:

You can try something like this:

df1['yield-CALC'] = df1['yield'] - df2['yield']

I'm assuming you don't want to join the dataframes, since the windcodes are not the same.

CodePudding user response:

Do we need to join 2 dataframes from windcodes column? The windcodes are all the same in the sample data you have given in Dataframe2. Can you explain this?

If we are going to join from the windscode field. The code below will work.

df = pd.merge(left=df1, right=df2,how='inner',on='windcodes')
df['yield-CALC'] = df['yield']-df['CALC']

CodePudding user response:

I will try to keep it as elaborated as possible:

environment I have used for coding is Jupyter Notebook

importing our required pandas library

import pandas as pd

getting your first table data in form of lists of lists (you can also use csv,excel etc here)

data_1 = [["163197.SH","shangguo comp",2.9248,"NO"],\
["154563.SH","guosheng comp",2.886,"Yes"] , ["789645.IB","guoyou comp",3.418,"NO"]]

creating dataframe one :

df_1 = pd.DataFrame(data_1 , columns = ["windcodes","name","yield","perp"])
df_1

Output:

enter image description here

getting your second table data in form of lists of lists (you can also use csv,excel etc here)

data_2 = [["1202203.IB",2.5517],["1202203.IB",2.48457],["1202203.IB",2.62296]]

creating dataframe two :

df_2 = pd.DataFrame(data_2 , columns = ["windcodes","CALC"])
df_2

Output:

enter image description here

Now creating the third dataframe:

df_3 = df_1 # becasue first 4 columns are same as our first dataframe 
df_3

Output:

enter image description here

Now calculating the fourth column i.e "yield-CALC" :

df_3["yield-CALC"] = df_1["yield"] - df_2["CALC"] # each df_1 datapoint will be subtracted from df_2 datapoint one by one (still confused? search for "SIMD")

df_3

Output:

enter image description here

Hope you understood.

Happy coding!

CodePudding user response:

Just in case you have completely different indexes, use df2's underlying numpy array:

df1['yield-CALC'] = df1['yield'] - df2['yield'].values
  • Related