The correlation with groupby returns all NaN values Python Dataframe-CodePudding

I have a dataframe like this (the real DF has 94 columns and 40 rows):

NAME	TIAS	EFGA	SOE	KERA	CODE	SURVIVAL
SOAP corp	`1.391164e 10`	`1.265005e 10`	`0.000000e 00`	`186522000.0`	366	21
NiANO inc	`42673.0`	0.0	0.0	42673.0	366	3
FFS jv	`9.523450e 05`	NaN	NaN	`8.754379e 09`	737	4
KELL Corp	`1.045967e 07`	`9.935970e 05`	`0.000000e 00`	NaN	737	4
Os inc	`7.732654e 10`	`4.046270e 07`	`1.391164e 10`	`8.754379e 09`	737	4

I need to make a correlation for each group in frame by CODE. The target value is SURVIVAL column. I tried this:

df = df.groupby('CODE').corr()[['SURVIVAL']]

but it returns something like this:

CODE		SURVIVAL
366	TIAS	NaN
	EFGA	NaN
	SOE	NaN
	KERA	NaN
	SURVIVAL	NaN
737	TIAS	NaN
	EFGA	NaN
	SOE	NaN
	KERA	NaN
	SURVIVAL	NaN

Why is it NaN in all columns? I tried to fill NaNs in DataFrame with mean values before making a correlations:

df = df.fillna(df.mean())

or drop them but it does not work.

But when I make the correlation for all dataframe without any modifications like this:

df.corr()[['SURVIVAL']]

everything works good and I have correlations, not NaNs.

All types are float64 and int64. Is there the way to get correlation by group without NaNs? I have no idea why it works on all dataframe but does not work in groups.

Thank you in advance for help!

CodePudding user response：

You can do it this way

df = df.groupby('CODE')[['SURVIVAL']].corr()

CodePudding user response：

Try this:

survival_corr = lambda x: x.corrwith(x['SURVIVAL'])
by_code = df.groupby('CODE')
by_code.apply(survival_corr)