Home > Software engineering >  The correlation with groupby returns all NaN values Python Dataframe
The correlation with groupby returns all NaN values Python Dataframe

Time:11-08

I have a dataframe like this (the real DF has 94 columns and 40 rows):

NAME TIAS EFGA SOE KERA CODE SURVIVAL
SOAP corp 1.391164e 10 1.265005e 10 0.000000e 00 186522000.0 366 21
NiANO inc 42673.0 0.0 0.0 42673.0 366 3
FFS jv 9.523450e 05 NaN NaN 8.754379e 09 737 4
KELL Corp 1.045967e 07 9.935970e 05 0.000000e 00 NaN 737 4
Os inc 7.732654e 10 4.046270e 07 1.391164e 10 8.754379e 09 737 4

I need to make a correlation for each group in frame by CODE. The target value is SURVIVAL column. I tried this:

df = df.groupby('CODE').corr()[['SURVIVAL']]

but it returns something like this:

CODE SURVIVAL
366 TIAS NaN
EFGA NaN
SOE NaN
KERA NaN
SURVIVAL NaN
737 TIAS NaN
EFGA NaN
SOE NaN
KERA NaN
SURVIVAL NaN

Why is it NaN in all columns? I tried to fill NaNs in DataFrame with mean values before making a correlations:

df = df.fillna(df.mean())

or drop them but it does not work.

But when I make the correlation for all dataframe without any modifications like this:

df.corr()[['SURVIVAL']]

everything works good and I have correlations, not NaNs.

All types are float64 and int64. Is there the way to get correlation by group without NaNs? I have no idea why it works on all dataframe but does not work in groups.

Thank you in advance for help!

CodePudding user response:

You can do it this way

df = df.groupby('CODE')[['SURVIVAL']].corr()

CodePudding user response:

Try this:

survival_corr = lambda x: x.corrwith(x['SURVIVAL'])
by_code = df.groupby('CODE')
by_code.apply(survival_corr)
  • Related