Home > database >  R best vizualization
R best vizualization

Time:04-19

I would like to do vizualisation of 2 vectors (predikcia & test data) of all wrongly classified numbers from my classification problem, where i have 76 data in both vectors - first one (predikcia) has numbers from 0-9 what classificator wrongly predicted and in second vector (test data) are numbers what it should be. Basic plot of these vectors has not good representation or not giving some good information about what numbers were wrongly classified and what number they should be classified correctly. Here is a picture what is basic plot showing plot(a,b)

data

classres <- data.frame(
  predikcia = c(9L, 8L, 3L, 9L, 1L, 6L, 2L, 2L, 
    6L, 3L, 5L, 9L, 8L, 1L, 5L, 1L, 3L, 3L, 5L, 9L, 
    5L, 1L, 8L, 9L, 5L, 0L, 1L, 9L, 5L, 5L, 8L, 9L,
    2L, 5L, 8L, 5L, 6L, 9L, 9L, 4L, 9L, 3L, 5L, 5L, 9L, 9L, 9L, 4L, 3L,    
    5L, 8L, 3L, 0L, 5L, 8L, 8L, 7L, 3L, 8L, 8L, 5L, 9L, 9L, 1L, 5L, 5L,
    9L, 9L, 5L, 3L, 1L, 9L, 2L, 5L, 8L, 9L),
  testdata = c(4L, 6L, 1L, 5L, 5L, 1L, 1L, 1L, 5L, 
   9L, 7L, 8L, 0L, 8L, 8L, 9L, 7L, 1L, 9L, 5L, 8L,
   8L, 0L, 5L, 1L, 8L, 4L, 1L, 9L, 1L, 0L, 5L, 1L,
   9L, 0L, 0L, 0L, 4L, 1L, 2L, 7L, 5L, 9L, 8L, 5L,
   5L, 5L, 1L, 9L, 9L, 0L, 9L, 8L, 9L, 6L, 0L, 8L,
   5L, 0L, 9L, 8L, 5L, 5L, 9L, 2L, 8L, 0L, 5L, 7L,
   1L, 8L, 8L, 9L, 9L, 7L, 1L))

CodePudding user response:

I'm assuming that there is either "correct" or "incorrect" predictions, otherwise the graph would need more work.

First, I have the data in which there are precitions and real values. In this examle they are integers, but I'm pretending that it does not mean anything.

classres <- data.frame(
  predikcia = c(9L, 8L, 3L, 9L, 1L, 6L, 2L, 2L, 
                6L, 3L, 5L, 9L, 8L, 1L, 5L, 1L, 3L, 3L, 5L, 9L), 
  testdata = c(4L, 6L, 1L, 5L, 5L, 1L, 1L, 1L, 5L, 
               9L, 7L, 8L, 0L, 8L, 8L, 9L, 7L, 1L, 9L, 5L))

Then I create a count data-frame. The "factor" part is important because I want all the possible combinations to appear on the plot.

dat.plot <- classres %>% 
  count(testdata, predikcia) %>% 
  mutate(
    testdata = factor(testdata, levels = 0:9),
    predikcia = factor(predikcia, levels = 0:9)) 

Finally, I create a heatmap from the data coloring the inside of each cell with the count values and adding a border to the cells where predictions are considered correct (this is why I need the goodclass data-frame).

goodclass <- data.frame(
  testdata = factor(0:9),
  predikcia = factor(0:9)
)

dat.plot %>%
  ggplot(aes(testdata, predikcia, fill = n))  
  geom_tile()   
  scale_fill_gradient(low = "goldenrod", high = "darkorchid4")   
  geom_tile(data = goodclass, 
            aes(testdata, predikcia, color = "Correct\npredictions"), 
            inherit.aes = FALSE, fill = NA, size = 2)  
  scale_color_manual(values = c(`Correct\npredictions` = "limegreen"))   
  labs(x = "Real class value", y = "Predicted class value", 
       fill = "count", color = "")  
  coord_equal()   
  theme_minimal()   
  theme(
    panel.grid.major = element_blank(),
    panel.grid.minor = element_line(color = "black", size = 2))

And the results hurst a little bit the eyes: it will probably need little bit more work to find more beautiful colors.

Heatmap for predictions

  • Related