I have a dataframe like this (the real DF has 94 columns and 40 rows):
NAME | TIAS | EFGA | SOE | KERA | CODE | SURVIVAL |
---|---|---|---|---|---|---|
SOAP corp | 1.391164e 10 |
1.265005e 10 |
0.000000e 00 |
186522000.0 |
366 | 21 |
NiANO inc | 42673.0 |
0.0 | 0.0 | 42673.0 | 366 | 3 |
FFS jv | 9.523450e 05 |
NaN | NaN | 8.754379e 09 |
737 | 4 |
KELL Corp | 1.045967e 07 |
9.935970e 05 |
0.000000e 00 |
NaN | 737 | 4 |
Os inc | 7.732654e 10 |
4.046270e 07 |
1.391164e 10 |
8.754379e 09 |
737 | 4 |
I need to make a correlation for each group in frame by CODE. The target value is SURVIVAL column. I tried this:
df = df.groupby('CODE').corr()[['SURVIVAL']]
but it returns something like this:
CODE | SURVIVAL | |
---|---|---|
366 | TIAS | NaN |
EFGA | NaN | |
SOE | NaN | |
KERA | NaN | |
SURVIVAL | NaN | |
737 | TIAS | NaN |
EFGA | NaN | |
SOE | NaN | |
KERA | NaN | |
SURVIVAL | NaN |
Why is it NaN in all columns? I tried to fill NaNs in DataFrame with mean values before making a correlations:
df = df.fillna(df.mean())
or drop them but it does not work.
But when I make the correlation for all dataframe without any modifications like this:
df.corr()[['SURVIVAL']]
everything works good and I have correlations, not NaNs.
All types are float64 and int64. Is there the way to get correlation by group without NaNs? I have no idea why it works on all dataframe but does not work in groups.
Thank you in advance for help!
CodePudding user response:
You can do it this way
df = df.groupby('CODE')[['SURVIVAL']].corr()
CodePudding user response:
Try this:
survival_corr = lambda x: x.corrwith(x['SURVIVAL'])
by_code = df.groupby('CODE')
by_code.apply(survival_corr)