Home > OS >  geom_dotplot with decimal values
geom_dotplot with decimal values

Time:12-01

I have some troubles with the dotplot function applied on my data. I could reproduced the problem using diamonds data. The problem is that some different values are grouped together whereas they shouldn't be. For example, the two yellow dots ("J") are aligned together whereas one is 64 and the other is 63.8. There is another value 63.8 colored as "E" that is just below. I would like something more accurate according to the value. It seems that value are rounding and that sometimes these value are separated on several lines. I don't see the problem on other data (see example 2).

Example 1

data("diamonds")
dia=diamonds[1:30,]
dia[order(dia$depth, decreasing = TRUE), ]

 carat       cut color clarity depth table price    x    y    z
9   0.22      Fair     E     VS2  65.1    61   337 3.87 3.78 2.49
11  0.30      Good     J     SI1  64.0    55   339 4.25 4.28 2.73
19  0.30      Good     J     SI1  63.8    56   351 4.23 4.26 2.71
22  0.23 Very Good     E     VS2  63.8    55   352 3.85 3.92 2.48
18  0.30      Good     J     SI1  63.4    54   351 4.23 4.29 2.70
5   0.31      Good     J     SI2  63.3    58   335 4.34 4.35 2.75
21  0.30      Good     I     SI2  63.3    56   351 4.26 4.30 2.71
6   0.24 Very Good     J    VVS2  62.8    57   336 3.94 3.96 2.48
12  0.23     Ideal     J     VS1  62.8    56   340 3.93 3.90 2.46
20  0.30 Very Good     J     SI1  62.7    59   351 4.21 4.27 2.66
27  0.24   Premium     I     VS1  62.5    57   355 3.97 3.94 2.47
4   0.29   Premium     I     VS2  62.4    58   334 4.20 4.23 2.63
7   0.24 Very Good     I    VVS1  62.3    57   336 3.95 3.98 2.47
14  0.31     Ideal     J     SI2  62.2    54   344 4.35 4.37 2.71
28  0.30 Very Good     J     VS2  62.2    57   357 4.28 4.30 2.67
17  0.30     Ideal     I     SI2  62.0    54   348 4.31 4.34 2.68
8   0.26 Very Good     H     SI1  61.9    55   337 4.07 4.11 2.53
1   0.23     Ideal     E     SI2  61.5    55   326 3.95 3.98 2.43
23  0.23 Very Good     H     VS1  61.0    57   353 3.94 3.96 2.41
16  0.32   Premium     E      I1  60.9    58   345 4.38 4.42 2.68
30  0.23 Very Good     F     VS1  60.9    57   357 3.96 3.99 2.42
29  0.23 Very Good     D     VS2  60.5    61   357 3.96 3.97 2.40
13  0.22   Premium     F     SI1  60.4    61   342 3.88 3.84 2.33
26  0.23 Very Good     G    VVS2  60.4    58   354 3.97 4.01 2.41
15  0.20   Premium     E     SI2  60.2    62   345 3.79 3.75 2.27
2   0.21   Premium     E     SI1  59.8    61   326 3.89 3.84 2.31
10  0.23 Very Good     H     VS1  59.4    61   338 4.00 4.05 2.39
24  0.31 Very Good     J     SI1  59.4    62   353 4.39 4.43 2.62
25  0.31 Very Good     J     SI1  58.1    62   353 4.44 4.47 2.59
3   0.23      Good     E     VS1  56.9    65   327 4.05 4.07 2.31
> 
ggplot(dia, aes(y=depth, x=""))  
  geom_boxplot()  
  geom_dotplot(aes(fill=factor(color)), binaxis='y', stackdir='center', dotsize=0.5, stackgroups = TRUE) 

enter image description here

With other data (see below) that I created to see better what was happening, the problem doesn't exist anymore

Example 2

abb=c(1,1.5,1.5,1.5,2,2,2,2.5,3.5,5,5,5.5,5.5)
bcc=c("Lyon", "Lyon", "Bordeaux", "Bordeaux", "Chambéry", "Lyon", "Lyon", "Nantes", "Nantes", "Lyon", "Lyon", "Rennes", "Lyon")
Fil=data.frame(abb,bcc)

> Fil
   abb      bcc
1  1.0     Lyon
2  1.5     Lyon
3  1.5 Bordeaux
4  1.5 Bordeaux
5  2.0 Chambéry
6  2.0     Lyon
7  2.0     Lyon
8  2.5   Nantes
9  3.5   Nantes
10 5.0     Lyon
11 5.0     Lyon
12 5.5   Rennes
13 5.5     Lyon

with dotplot, I the value 5 and 5.5 are group together. I would like the same but with different "lines" for this values

ggplot(Fil, aes(y=abb, x=""))  
  geom_boxplot()  
  geom_dotplot(aes(fill=factor(bcc)), binaxis='y', stackdir='center', dotsize=0.5, stackgroups = TRUE)   
  scale_fill_manual(values = c("#FF8000", "#FF0033","#80FF00","#FFFF00", "#000000"))

enter image description here

What are the solution to fix this problem ?

CodePudding user response:

You just need to change the binwidth. Before I changed the binwidth I was getting a message from ggplot regarding selecting the binwidth. The message I was getting said

Bin width defaults to 1/30 of the range of the data. Pick better value with binwidth.

ggplot2 was telling you that binwidth was the issue

ggplot(dia, aes(y=depth, x=""))  
  geom_boxplot()  
  geom_dotplot(aes(fill=factor(color)), binaxis='y', stackdir='center', dotsize=0.5, stackgroups = TRUE,binwidth = 0.2)

example

Fixing the bin width is just a bandaid. I think using geom_jitter might be a good alternative

  • Related