Home > Blockchain >  Creating a new column based on several groupby conditions in python
Creating a new column based on several groupby conditions in python

Time:07-25

I have the following dataframe. I want to create another variable Bond that divides the Rate for Pool Collateral by Single Collateral for each country and for each year. How would you do that? For example, you would divide the rate in line 0 by the rate in line 4 or the rate in line 8 by the rate in line 7. For the countries that do not have an observation for Pool and Single Collateral, I want a missing value.

import pandas as pd

df={'Date': [2016, 2016, 2016, 2016, 2016, 2016, 2016, 2017, 2017, 2017, 2017, 2017, 2017],
   'Country':['DE', 'AT','GB', 'FR', 'DE', 'FR', 'AT', 'GB', 'GB', 'DE', 'AT', 'FR' , 'AT'],
  'Collateral':['Pool', 'Pool', 'Single', 'Single', 'Single', 'Pool', 'Single', 'Single', 'Pool', 'Pool', 'Pool', 'Pool', 'Single'],
  'Rate': [0.2, 0.3, 0.1, 0.4, 0.6, 0.002, 0.005, 0.2, 0.9, 0.5, 0.67, 0.08, 0.4]}
                  
df = pd.DataFrame(df) 

df

CodePudding user response:

How about something like this?

df_pool = df.query("Collateral == 'Pool'")
df_single = df.query("Collateral == 'Single'")
df_merge = pd.merge_ordered(df_pool, df_single, on=["Country","Date"], suffixes=["_pool", "_single"])
df_merge["Bond"] = df_merge["Rate_pool"]/df_merge["Rate_single"]

Output:

   Date Country  Rate_pool  Rate_single       Bond
0  2016      AT      0.300        0.005  60.000000
1  2017      AT      0.670        0.400   1.675000
2  2016      DE      0.200        0.600   0.333333
3  2017      DE      0.500          NaN        NaN
4  2016      FR      0.002        0.400   0.005000
5  2017      FR      0.080          NaN        NaN
6  2016      GB        NaN        0.100        NaN
7  2017      GB      0.900        0.200   4.500000

CodePudding user response:

Using method from Set column name for apply result over groupby

df2 = (df.groupby(['Country', 'Date'], as_index = True)
       .apply(lambda grp: pd.Series({'Bond': grp[grp.Collateral=='Pool'].Rate.mean()/
                                              grp[grp.Collateral=='Single'].Rate.mean()})))
print(df2)

Output

                  Bond
Country Date           
AT      2016  60.000000
        2017   1.675000
DE      2016   0.333333
        2017        NaN
FR      2016   0.005000
        2017        NaN
GB      2016        NaN
        2017   4.500000
  • Related