My data consists of three numeric variables. Something like this:
set.seed(1)
df <- data.frame(x= rnorm(10000), y= rnorm(10000))
df$col= df$x df$y df$x*df$y
Plotting this as a heatplot looks good:
ggplot(df, aes(x, y, col= col)) geom_point(size= 2) scale_color_distiller(palette = "Spectral")
But real variables can have some skewness or outliers and this totally changes the plot. After df$col[nrow(df)] <- 100
same ggplot
code as above returns this plot:
Clearly, the problem is that this one point changes the scale and we get a plot with little information. My solution is to rank the data with rank()
which gives a reasonable color progression for any variable I`ve tried so far. See here:
ggplot(df, aes(x, y, col= rank(col))) geom_point(size= 2) scale_color_distiller(palette = "Spectral")
The problem with this solution that the new scale (2,500 to 10,000) is shown as the color label. I want the original scale to be shown as color label (o to 10). Therefor, I want that the color progression corresponds to the ranked data; i.e. I need to somehow map the original values to the ranked color values. Is that possible? I tried to