Home > Mobile >  Calculate Pearson Correlation on single column in pandas dataframe
Calculate Pearson Correlation on single column in pandas dataframe

Time:12-21

I have dataframe as below. I am finding the Pearson correlation between "host1" and "host2".

Print(df)

Server  Timestamp           Value
host1   12/20/2021 12:53    83.73
host1   12/20/2021 12:54    55.32
host1   12/20/2021 12:56    76.52
host1   12/20/2021 12:57    7.57
host1   12/20/2021 12:58    81.59
host1   12/20/2021 13:00    5.72
host1   12/20/2021 13:01    26.33
host2   12/20/2021 12:53    82.41
host2   12/20/2021 12:54    65.8
host2   12/20/2021 12:56    71.64
host2   12/20/2021 12:57    39.45
host2   12/20/2021 12:58    8.37
host2   12/20/2021 13:00    82.89
host2   12/20/2021 13:01    15.54

created df1 and df2 data frames to separate fields and checked the correlation between the two data frames.

df1=df.loc[0:6, ['Value']]
df2=df.loc[7:14, ['Value']]
df1.corr(df2)

Error-
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, 
a.bool(), a.item(), a.any() or a.all().

Expected output-

Relationship matrix     
    host1   host2
host1   1   0.5
host2   0.5 1

CodePudding user response:

You can first pivot your dataframe to create two columns with your 'host1' and 'host2' values, and then calculate the correlation using corr():

df.pivot(columns='Server', values='Value')\
    .apply(lambda r: r.dropna()\
           .reset_index(drop=True)).corr()

Server    host1    host2
Server                  
host1   1.00000  0.04225
host2   0.04225  1.00000

CodePudding user response:

Make the following change to your code and you will be good to go:

df1=df.loc[0:6, 'Value']
df2=df.loc[7:14, 'Value']
df1.corr(df2)

CodePudding user response:

df1=df[0:6]
df2=df[7:14]
data=[df1.Value , df2.Value]
labels = ["host1","host2"]
df3 = pd.concat(data, axis=1,keys=headers)
df4=df3.corr(method='pearson', min_periods=1)
  • Related