I have the following dataframe. I want to create another variable Bond that divides the Rate for Pool Collateral by Single Collateral for each country and for each year. How would you do that? For example, you would divide the rate in line 0 by the rate in line 4 or the rate in line 8 by the rate in line 7. For the countries that do not have an observation for Pool and Single Collateral, I want a missing value.
import pandas as pd
df={'Date': [2016, 2016, 2016, 2016, 2016, 2016, 2016, 2017, 2017, 2017, 2017, 2017, 2017],
'Country':['DE', 'AT','GB', 'FR', 'DE', 'FR', 'AT', 'GB', 'GB', 'DE', 'AT', 'FR' , 'AT'],
'Collateral':['Pool', 'Pool', 'Single', 'Single', 'Single', 'Pool', 'Single', 'Single', 'Pool', 'Pool', 'Pool', 'Pool', 'Single'],
'Rate': [0.2, 0.3, 0.1, 0.4, 0.6, 0.002, 0.005, 0.2, 0.9, 0.5, 0.67, 0.08, 0.4]}
df = pd.DataFrame(df)
df
CodePudding user response:
How about something like this?
df_pool = df.query("Collateral == 'Pool'")
df_single = df.query("Collateral == 'Single'")
df_merge = pd.merge_ordered(df_pool, df_single, on=["Country","Date"], suffixes=["_pool", "_single"])
df_merge["Bond"] = df_merge["Rate_pool"]/df_merge["Rate_single"]
Output:
Date Country Rate_pool Rate_single Bond
0 2016 AT 0.300 0.005 60.000000
1 2017 AT 0.670 0.400 1.675000
2 2016 DE 0.200 0.600 0.333333
3 2017 DE 0.500 NaN NaN
4 2016 FR 0.002 0.400 0.005000
5 2017 FR 0.080 NaN NaN
6 2016 GB NaN 0.100 NaN
7 2017 GB 0.900 0.200 4.500000
CodePudding user response:
Using method from Set column name for apply result over groupby
df2 = (df.groupby(['Country', 'Date'], as_index = True)
.apply(lambda grp: pd.Series({'Bond': grp[grp.Collateral=='Pool'].Rate.mean()/
grp[grp.Collateral=='Single'].Rate.mean()})))
print(df2)
Output
Bond
Country Date
AT 2016 60.000000
2017 1.675000
DE 2016 0.333333
2017 NaN
FR 2016 0.005000
2017 NaN
GB 2016 NaN
2017 4.500000