I have dataframe as below. I am finding the Pearson correlation between "host1" and "host2".
Print(df)
Server Timestamp Value
host1 12/20/2021 12:53 83.73
host1 12/20/2021 12:54 55.32
host1 12/20/2021 12:56 76.52
host1 12/20/2021 12:57 7.57
host1 12/20/2021 12:58 81.59
host1 12/20/2021 13:00 5.72
host1 12/20/2021 13:01 26.33
host2 12/20/2021 12:53 82.41
host2 12/20/2021 12:54 65.8
host2 12/20/2021 12:56 71.64
host2 12/20/2021 12:57 39.45
host2 12/20/2021 12:58 8.37
host2 12/20/2021 13:00 82.89
host2 12/20/2021 13:01 15.54
created df1 and df2 data frames to separate fields and checked the correlation between the two data frames.
df1=df.loc[0:6, ['Value']]
df2=df.loc[7:14, ['Value']]
df1.corr(df2)
Error-
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty,
a.bool(), a.item(), a.any() or a.all().
Expected output-
Relationship matrix
host1 host2
host1 1 0.5
host2 0.5 1
CodePudding user response:
You can first pivot your dataframe to create two columns with your 'host1' and 'host2' values, and then calculate the correlation using corr()
:
df.pivot(columns='Server', values='Value')\
.apply(lambda r: r.dropna()\
.reset_index(drop=True)).corr()
Server host1 host2
Server
host1 1.00000 0.04225
host2 0.04225 1.00000
CodePudding user response:
Make the following change to your code and you will be good to go:
df1=df.loc[0:6, 'Value']
df2=df.loc[7:14, 'Value']
df1.corr(df2)
CodePudding user response:
df1=df[0:6]
df2=df[7:14]
data=[df1.Value , df2.Value]
labels = ["host1","host2"]
df3 = pd.concat(data, axis=1,keys=headers)
df4=df3.corr(method='pearson', min_periods=1)