I need some help here. In this DataFrame I need to perform a series of calculations by grouping column'cnpj'
What I have as input:
cnpj valor_cota ln_ativo ln_IBX CDI tracking
0 cat1 1114.7521 0.027152 -0.003659 23.893879 0.030811
1 cat1 1135.3557 0.018314 0.003691 23.904737 0.014623
2 cat1 1149.3381 0.012240 0.007960 23.915600 0.004281
3 cat2 500.0000 0.002002 -0.003659 23.893879 0.005661
4 cat2 501.0000 0.001998 0.003691 23.904737 -0.001693
5 cat2 510.0000 0.017805 0.007960 23.915600 0.009845
6 cat3 10000.0000 -0.006976 -0.003659 23.893879 -0.003317
7 cat3 10050.0000 0.004988 0.003691 23.904737 0.001297
8 cat3 10100.0000 0.004963 0.007960 23.915600 -0.002997
What I expect as an output:
cnpj ret_5_CDI t_5_Asset anual_5 vol alfa beta sharpe tracking_err R^2
0 cat1 0,000909 0,031026 0,006130 0,119035 0,022631 -1,274627 0,253007 0,013372 0,997950
1 cat2 0,000909 0,020000 0,003968 0,144852 0,004041 1,211459 0,131796 0,005841 0,608783
2 cat3 0,000909 0,010000 0,001992 0,109531 0,003323 1,093127 0,082999 0,002576 0,866884
Here is what I got so far:
import numpy as np
import pandas as pd
from statistics import stdev
#Retorno CDI no periodo 5 anos (Dataframe)
def v1(g):
first, last = g['CDI'].values[[0,-1]]
return last/first-1
#Retorno Ativo no periodo 5 anos (Dataframe)
def v2(g):
first, last = g['valor_cota'].values[[0,-1]]
return last/first-1
#Retorno do ativo anualizado 5 anos
def v3(g):
first, last = g['valor_cota'].values[[0,-1]]
result = last/first-1
return ((1 result)**(1/5))-1
#Volatilidade do ativo 5 anos
def v4(g):
return stdev(g['ln_ativo'])*(252**0.5)
#Alfa
def v5(g):
slope, intercept = np.polyfit(g['ln_IBX'], g['ln_ativo'], 1)
return intercept
#Beta
def v6(g):
slope, intercept = np.polyfit(g['ln_IBX'], g['ln_ativo'], 1)
return slope
#Sharpe
def v7(g):
return (v2(g)-v1(g))/v4(g)
#Tracking Error
def v8(g):
return stdev(g['tracking'])
# R²
def v9(g):
slope, intercept = np.polyfit(g['ln_ativo'], g['ln_IBX'], 1)
r_squared = 1 - (sum((g['ln_IBX'] - (slope * g['ln_ativo'] intercept))**2) / ((len(g['ln_IBX']) - 1) * np.var(g['ln_IBX'], ddof=1)))
return r_squared
data = {'cnpj': ['cat1', 'cat1', 'cat1', 'cat2', 'cat2', 'cat2', 'cat3', 'cat3', 'cat3'],
'valor_cota': [1114.7521, 1135.3557, 1149.3381, 500, 501, 510, 10000, 10050, 10100 ],
'ln_ativo': [0.02715207, 0.01831395, 0.01224022, 0.00200200, 0.00199800, 0.01780462, -0.00697561, 0.00498754, 0.00496279 ],
'ln_IBX': [-0.00365884, 0.00369080, 0.00795953, -0.00365884, 0.00369080, 0.00795953, -0.00365884, 0.00369080, 0.00795953 ],
'CDI' : [23.89387945, 23.90473731, 23.9156001, 23.89387945, 23.90473731, 23.9156001, 23.89387945, 23.90473731, 23.9156001 ],
'tracking' : [0.030810905, 0.014623144, 0.004280688, 0.005660841, -0.0016928, 0.009845095, -0.00331678, 0.001296739, -0.00299674]
}
df =pd.DataFrame(data)
df.groupby('cnpj').apply(v1).reset_index()
df.groupby('cnpj').apply(v2).reset_index()
df.groupby('cnpj').apply(v3).reset_index()
df.groupby('cnpj').apply(v4).reset_index()
df.groupby('cnpj').apply(v5).reset_index()
df.groupby('cnpj').apply(v6).reset_index()
df.groupby('cnpj').apply(v7).reset_index()
df.groupby('cnpj').apply(v8).reset_index()
df.groupby('cnpj').apply(v9).reset_index()
I would like each call of these functions as a column in the result DataFrame as noted as expected result above. Hope to have achieved best practice questioning.
CodePudding user response:
You can pass multiple functions to apply
; then construct a DataFrame with the result:
grouped = df.groupby('cnpj').apply(lambda g: [v1(g),v2(g),v3(g),v4(g),v5(g),v6(g),v7(g),v8(g),v9(g)])
out = pd.DataFrame(grouped.tolist(), index=grouped.index,
columns=['ret_5_CDI','t_5_Asset','anual_5','vol','alfa',
'beta','sharpe','tracking_err','R^2']).reset_index()
Output:
cnpj ret_5_CDI t_5_Asset anual_5 vol alfa beta sharpe tracking_err R^2
0 cat1 0.000909 0.031026 0.006130 0.119035 0.022631 -1.274627 0.253007 0.013372 0.997950
1 cat2 0.000909 0.020000 0.003968 0.144852 0.004041 1.211459 0.131797 0.005841 0.608784
2 cat3 0.000909 0.010000 0.001992 0.109531 -0.001920 1.093126 0.082999 0.002576 0.866884