Create a dictionary of all unique keys in a column and store correlation co-efficients of other colu-CodePudding

There is a dataset with three columns:

Col 1 : Name_of_Village
Col 2: Average_monthly_savings
Col 3: networth_in_dollars

So, I want to create a dictionary "Vill_corr" where the key values are the name of the villages and the associated values are the correlation co-effient between Col2 & Col3 using Pandas.

I am aware of methods of calculating the correlation co-efficients, but not sure how to store it against each Village name key,

corr = df["Col2"].corr(df["Col3"])

Please help.

CodePudding user response：

Use groupby.apply and Series.corr:

np.random.seed(0)

df = pd.DataFrame({'Name_of_Village': np.random.choice(list('ABCD'), size=100),
                   'Average_monthly_savings': np.random.randint(0, 1000, size=100),
                   'networth_in_dollars': np.random.randint(0, 1000, size=100),
                  })

out = (df.groupby('Name_of_Village')
         .apply(lambda g: g['Average_monthly_savings'].corr(g['networth_in_dollars']))
      )

Output:

Name_of_Village
A   -0.081200
B   -0.020895
C    0.208151
D   -0.010569
dtype: float64

As dictionary:

out.to_dict()

Output:

{'A': -0.08120016678846673,
 'B': -0.020894973553868202,
 'C': 0.20815112481676484,
 'D': -0.010569152488799725}