I've a dataset like this:
1 1 0.5378291300966559
1 2 0.5536607043661815
2 2 0.5524941673147428
1 3 0.5736584823908455
2 3 0.5759360071103211
3 3 0.5874347294745028
1 4 0.5926563715142762
2 4 0.5928230196644817
3 4 0.5994333962893011
4 4 0.6093211865348295
1 5 0.6073769581157649
2 5 0.6092100877680258
3 5 0.6138206865903788
4 5 0.6182646372625263
5 5 0.6275413842906343
The goal is to plot out a heatmap of the values where the first 2 columns are the axis and the 3rd is the value.
I've read them out so that it fits into the dataframe a pivoted it:
data_str = """1 1 0.5378291300966559
1 2 0.5536607043661815
2 2 0.5524941673147428
1 3 0.5736584823908455
2 3 0.5759360071103211
3 3 0.5874347294745028
1 4 0.5926563715142762
2 4 0.5928230196644817
3 4 0.5994333962893011
4 4 0.6093211865348295
1 5 0.6073769581157649
2 5 0.6092100877680258
3 5 0.6138206865903788
4 5 0.6182646372625263
5 5 0.6275413842906343""".split('\n')
import pandas as pd
data = [{'min':line.split()[0], 'max':line.split()[1], 'score':line.split()[2]} for line in data_str]
df = pd.DataFrame(data, dtype=float).pivot('min', 'max', 'score')
When I tried out the solution on
But what I am expecting is for it to plot out the triangle heatmap of the values I have in the score column. How should I go about the plotting that?
CodePudding user response:
The function name is get_lower_tri_heatmap
which will be the lower tri, in your df
df#upper tri
Out[101]:
max 1.0 2.0 3.0 4.0 5.0
min
1.0 0.537829 0.553661 0.573658 0.592656 0.607377
2.0 NaN 0.552494 0.575936 0.592823 0.609210
3.0 NaN NaN 0.587435 0.599433 0.613821
4.0 NaN NaN NaN 0.609321 0.618265
5.0 NaN NaN NaN NaN 0.627541
Try pass df.T
to the function
get_lower_tri_heatmap(df.T)
CodePudding user response:
I'm not sure what you tried exactly but simply plotting your data frame as an image works nicely for me
data_str = """1 1 0.5378291300966559
1 2 0.5536607043661815
2 2 0.5524941673147428
1 3 0.5736584823908455
2 3 0.5759360071103211
3 3 0.5874347294745028
1 4 0.5926563715142762
2 4 0.5928230196644817
3 4 0.5994333962893011
4 4 0.6093211865348295
1 5 0.6073769581157649
2 5 0.6092100877680258
3 5 0.6138206865903788
4 5 0.6182646372625263
5 5 0.6275413842906343""".split('\n')
import pandas as pd
data = [{'min':line.split()[0], 'max':line.split()[1], 'score':line.split()[2]} for line in data_str]
df = pd.DataFrame(data, dtype=float).pivot('min', 'max', 'score')
# NEW CODE HEREUNDER
import matplotlib.pyplot as plt
plt.imshow(df)
plt.show()
CodePudding user response:
I think you should first define an empty numpy array before assigning the values into it. Should look something like this:
import matplotlib.pyplot as plt
import numpy as np
a = np.zeros((5, 5))
t = """1 1 0.5378291300966559
1 2 0.5536607043661815
2 2 0.5524941673147428
1 3 0.5736584823908455
2 3 0.5759360071103211
3 3 0.5874347294745028
1 4 0.5926563715142762
2 4 0.5928230196644817
3 4 0.5994333962893011
4 4 0.6093211865348295
1 5 0.6073769581157649
2 5 0.6092100877680258
3 5 0.6138206865903788
4 5 0.6182646372625263
5 5 0.6275413842906343"""
for line in t.splitlines():
a[int(line.split()[0]) - 1][int(line.split()[1]) - 1] = line.split()[2]
plt.imshow(a, cmap='hot', interpolation='nearest')
plt.show()