There is a dataset with three columns:
- Col 1 : Name_of_Village
- Col 2: Average_monthly_savings
- Col 3: networth_in_dollars
So, I want to create a dictionary "Vill_corr" where the key values are the name of the villages and the associated values are the correlation co-effient between Col2 & Col3 using Pandas.
I am aware of methods of calculating the correlation co-efficients, but not sure how to store it against each Village name key,
corr = df["Col2"].corr(df["Col3"])
Please help.
CodePudding user response:
Use groupby.apply
and Series.corr
:
np.random.seed(0)
df = pd.DataFrame({'Name_of_Village': np.random.choice(list('ABCD'), size=100),
'Average_monthly_savings': np.random.randint(0, 1000, size=100),
'networth_in_dollars': np.random.randint(0, 1000, size=100),
})
out = (df.groupby('Name_of_Village')
.apply(lambda g: g['Average_monthly_savings'].corr(g['networth_in_dollars']))
)
Output:
Name_of_Village
A -0.081200
B -0.020895
C 0.208151
D -0.010569
dtype: float64
As dictionary:
out.to_dict()
Output:
{'A': -0.08120016678846673,
'B': -0.020894973553868202,
'C': 0.20815112481676484,
'D': -0.010569152488799725}