I have some troubles with the dotplot function applied on my data. I could reproduced the problem using diamonds data. The problem is that some different values are grouped together whereas they shouldn't be. For example, the two yellow dots ("J") are aligned together whereas one is 64 and the other is 63.8. There is another value 63.8 colored as "E" that is just below. I would like something more accurate according to the value. It seems that value are rounding and that sometimes these value are separated on several lines. I don't see the problem on other data (see example 2).
Example 1
data("diamonds")
dia=diamonds[1:30,]
dia[order(dia$depth, decreasing = TRUE), ]
carat cut color clarity depth table price x y z
9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49
11 0.30 Good J SI1 64.0 55 339 4.25 4.28 2.73
19 0.30 Good J SI1 63.8 56 351 4.23 4.26 2.71
22 0.23 Very Good E VS2 63.8 55 352 3.85 3.92 2.48
18 0.30 Good J SI1 63.4 54 351 4.23 4.29 2.70
5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
21 0.30 Good I SI2 63.3 56 351 4.26 4.30 2.71
6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
12 0.23 Ideal J VS1 62.8 56 340 3.93 3.90 2.46
20 0.30 Very Good J SI1 62.7 59 351 4.21 4.27 2.66
27 0.24 Premium I VS1 62.5 57 355 3.97 3.94 2.47
4 0.29 Premium I VS2 62.4 58 334 4.20 4.23 2.63
7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47
14 0.31 Ideal J SI2 62.2 54 344 4.35 4.37 2.71
28 0.30 Very Good J VS2 62.2 57 357 4.28 4.30 2.67
17 0.30 Ideal I SI2 62.0 54 348 4.31 4.34 2.68
8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53
1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
23 0.23 Very Good H VS1 61.0 57 353 3.94 3.96 2.41
16 0.32 Premium E I1 60.9 58 345 4.38 4.42 2.68
30 0.23 Very Good F VS1 60.9 57 357 3.96 3.99 2.42
29 0.23 Very Good D VS2 60.5 61 357 3.96 3.97 2.40
13 0.22 Premium F SI1 60.4 61 342 3.88 3.84 2.33
26 0.23 Very Good G VVS2 60.4 58 354 3.97 4.01 2.41
15 0.20 Premium E SI2 60.2 62 345 3.79 3.75 2.27
2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
10 0.23 Very Good H VS1 59.4 61 338 4.00 4.05 2.39
24 0.31 Very Good J SI1 59.4 62 353 4.39 4.43 2.62
25 0.31 Very Good J SI1 58.1 62 353 4.44 4.47 2.59
3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
>
ggplot(dia, aes(y=depth, x=""))
geom_boxplot()
geom_dotplot(aes(fill=factor(color)), binaxis='y', stackdir='center', dotsize=0.5, stackgroups = TRUE)
With other data (see below) that I created to see better what was happening, the problem doesn't exist anymore
Example 2
abb=c(1,1.5,1.5,1.5,2,2,2,2.5,3.5,5,5,5.5,5.5)
bcc=c("Lyon", "Lyon", "Bordeaux", "Bordeaux", "Chambéry", "Lyon", "Lyon", "Nantes", "Nantes", "Lyon", "Lyon", "Rennes", "Lyon")
Fil=data.frame(abb,bcc)
> Fil
abb bcc
1 1.0 Lyon
2 1.5 Lyon
3 1.5 Bordeaux
4 1.5 Bordeaux
5 2.0 Chambéry
6 2.0 Lyon
7 2.0 Lyon
8 2.5 Nantes
9 3.5 Nantes
10 5.0 Lyon
11 5.0 Lyon
12 5.5 Rennes
13 5.5 Lyon
with dotplot, I the value 5 and 5.5 are group together. I would like the same but with different "lines" for this values
ggplot(Fil, aes(y=abb, x=""))
geom_boxplot()
geom_dotplot(aes(fill=factor(bcc)), binaxis='y', stackdir='center', dotsize=0.5, stackgroups = TRUE)
scale_fill_manual(values = c("#FF8000", "#FF0033","#80FF00","#FFFF00", "#000000"))
What are the solution to fix this problem ?
CodePudding user response:
You just need to change the binwidth
. Before I changed the binwidth
I was getting a message from ggplot regarding selecting the binwidth
. The message I was getting said
Bin width defaults to 1/30 of the range of the data. Pick better value with
binwidth
.
ggplot2
was telling you that binwidth
was the issue
ggplot(dia, aes(y=depth, x=""))
geom_boxplot()
geom_dotplot(aes(fill=factor(color)), binaxis='y', stackdir='center', dotsize=0.5, stackgroups = TRUE,binwidth = 0.2)
Fixing the bin width is just a bandaid. I think using geom_jitter
might be a good alternative