Home > database >  Plotting a heatmap of dataframe values with 2 indices
Plotting a heatmap of dataframe values with 2 indices

Time:07-19

I've a dataset like this:

1 1 0.5378291300966559
1 2 0.5536607043661815
2 2 0.5524941673147428
1 3 0.5736584823908455
2 3 0.5759360071103211
3 3 0.5874347294745028
1 4 0.5926563715142762
2 4 0.5928230196644817
3 4 0.5994333962893011
4 4 0.6093211865348295
1 5 0.6073769581157649
2 5 0.6092100877680258
3 5 0.6138206865903788
4 5 0.6182646372625263
5 5 0.6275413842906343

The goal is to plot out a heatmap of the values where the first 2 columns are the axis and the 3rd is the value.

I've read them out so that it fits into the dataframe a pivoted it:

data_str = """1 1 0.5378291300966559
1 2 0.5536607043661815
2 2 0.5524941673147428
1 3 0.5736584823908455
2 3 0.5759360071103211
3 3 0.5874347294745028
1 4 0.5926563715142762
2 4 0.5928230196644817
3 4 0.5994333962893011
4 4 0.6093211865348295
1 5 0.6073769581157649
2 5 0.6092100877680258
3 5 0.6138206865903788
4 5 0.6182646372625263
5 5 0.6275413842906343""".split('\n')

import pandas as pd


data = [{'min':line.split()[0], 'max':line.split()[1], 'score':line.split()[2]} for line in data_str]
df = pd.DataFrame(data, dtype=float).pivot('min', 'max', 'score')

When I tried out the solution on enter image description here

But what I am expecting is for it to plot out the triangle heatmap of the values I have in the score column. How should I go about the plotting that?

CodePudding user response:

The function name is get_lower_tri_heatmap which will be the lower tri, in your df

df#upper tri 
Out[101]: 
max       1.0       2.0       3.0       4.0       5.0
min                                                  
1.0  0.537829  0.553661  0.573658  0.592656  0.607377
2.0       NaN  0.552494  0.575936  0.592823  0.609210
3.0       NaN       NaN  0.587435  0.599433  0.613821
4.0       NaN       NaN       NaN  0.609321  0.618265
5.0       NaN       NaN       NaN       NaN  0.627541

Try pass df.T to the function

get_lower_tri_heatmap(df.T)

CodePudding user response:

I'm not sure what you tried exactly but simply plotting your data frame as an image works nicely for me

data_str = """1 1 0.5378291300966559
1 2 0.5536607043661815
2 2 0.5524941673147428
1 3 0.5736584823908455
2 3 0.5759360071103211
3 3 0.5874347294745028
1 4 0.5926563715142762
2 4 0.5928230196644817
3 4 0.5994333962893011
4 4 0.6093211865348295
1 5 0.6073769581157649
2 5 0.6092100877680258
3 5 0.6138206865903788
4 5 0.6182646372625263
5 5 0.6275413842906343""".split('\n')

import pandas as pd


data = [{'min':line.split()[0], 'max':line.split()[1], 'score':line.split()[2]} for line in data_str]
df = pd.DataFrame(data, dtype=float).pivot('min', 'max', 'score')

# NEW CODE HEREUNDER
import matplotlib.pyplot as plt
plt.imshow(df)
plt.show()

CodePudding user response:

I think you should first define an empty numpy array before assigning the values into it. Should look something like this:

import matplotlib.pyplot as plt
import numpy as np
a = np.zeros((5, 5))
t = """1 1 0.5378291300966559
1 2 0.5536607043661815
2 2 0.5524941673147428
1 3 0.5736584823908455
2 3 0.5759360071103211
3 3 0.5874347294745028
1 4 0.5926563715142762
2 4 0.5928230196644817
3 4 0.5994333962893011
4 4 0.6093211865348295
1 5 0.6073769581157649
2 5 0.6092100877680258
3 5 0.6138206865903788
4 5 0.6182646372625263
5 5 0.6275413842906343"""
for line in t.splitlines():
    a[int(line.split()[0]) - 1][int(line.split()[1]) - 1] = line.split()[2]
plt.imshow(a, cmap='hot', interpolation='nearest')
plt.show()
  • Related