Home > Blockchain >  Histogram interpretation of Weka
Histogram interpretation of Weka

Time:06-04

I have doubt on data interpretation on Weka. The data set on which I worked on as follows

outlook  temperature    humidity    windy   play
------------------------------------------------------
sunny    hot            high        FALSE   no
sunny    hot            high        TRUE    no
overcast hot            high        FALSE   yes
rainy    mild           high        FALSE   yes
rainy    cool           normal      FALSE   yes
rainy    cool           normal      TRUE    no
overcast cool           normal      TRUE    yes
sunny    mild           high        FALSE   no
sunny    cool           normal      FALSE   yes
rainy    mild           normal      FALSE   yes
sunny    mild           normal      TRUE    yes
overcast mild           high        TRUE    yes
overcast hot            normal      FALSE   yes
rainy    mild           high        TRUE    no

The histograms that has been generated in Weka depending upon the above data set, those histograms are not clear to me. enter image description here

I know blue color means one can play and red color means one can not play. To draw a histogram we need to find out the frequency of data.

In the above picture, the sunny count is 5 because as per the above dataset outlook attribute has 5 values that are sunny. In the above picture, the overcast count is 4 because as per the above dataset outlook attribute has 4 values that are overcast. If the outlook is overcast one can play, there is no option that one cannot play even if the outlook is overcast. So the overcast bar is pure blue. However, if the outlook is sunny there are 3 possibilities one can play and 2 possibilities one cannot play. Therefore, the bar sunny is a mixture of blue and red.

Now, how could I know by just looking on the bar how many instances are yes if the outlook is sunny and how many instances are no, if the outlook is sunny?

Also, how could I draw such a histogram in excel?

Thank you.

CodePudding user response:

The bar-plot (nominal class) or histogram (numeric class) in Weka's Explorer is only there to give you an idea about the data. At the time of writing, there is no way to tell the various counts for the associated class labels from the graph apart from the total, which is displayed on top.

I don't use Excel, so I can't comment on that.

For reference: the responsible class in Weka's source code for drawing these is weka.gui.AttributeVisualizationPanel. The inner classes BarCalc and HistCalc perform the respective calculations.

  • Related