I'm trying to generate a parallel coordinates plot, where each variable has its own axis. For instance:
So far, I've used the function ggparcoord()
from the package GGally
. However, as far as I can tell, it does not allow each variable to have its own axis as above.
Does anyone know how this may be done using R
, preferably using ggplot2
? Thanks in advance.
CodePudding user response:
I'm not aware of any packages that can do this, but it's not too difficult to draw the axes yourself in ggplot.
Let's say we have a similar dataset to the one shown in your example plot:
library(ggplot2)
library(dplyr)
cars <- mtcars %>%
select(c(2:4, 6:7, 1)) %>%
tibble::rownames_to_column("model") %>%
as_tibble()
cars
#> # A tibble: 32 x 7
#> model cyl disp hp wt qsec mpg
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Mazda RX4 6 160 110 2.62 16.5 21
#> 2 Mazda RX4 Wag 6 160 110 2.88 17.0 21
#> 3 Datsun 710 4 108 93 2.32 18.6 22.8
#> 4 Hornet 4 Drive 6 258 110 3.22 19.4 21.4
#> 5 Hornet Sportabout 8 360 175 3.44 17.0 18.7
#> 6 Valiant 6 225 105 3.46 20.2 18.1
#> 7 Duster 360 8 360 245 3.57 15.8 14.3
#> 8 Merc 240D 4 147. 62 3.19 20 24.4
#> 9 Merc 230 4 141. 95 3.15 22.9 22.8
#> 10 Merc 280 6 168. 123 3.44 18.3 19.2
#> # ... with 22 more rows
We can calculate the axis breaks (and set the coordinates of the tick marks) using some simple arithmetic:
axis_df <- stack(cars[-1]) %>%
group_by(ind) %>%
summarize(breaks = pretty(values, n = 10),
yval = (breaks - min(breaks))/(max(values) - min(values))) %>%
mutate(xmin = as.numeric(ind) - 0.05,
xmax = as.numeric(ind),
x_text = as.numeric(ind) - 0.2)
And the co-ordinates for our actual axis lines like this:
axis_line_df <- axis_df %>%
group_by(ind) %>%
summarize(min = min(yval), max = max(yval))
Now we need to reshape and normalize the original data:
lines_df <- cars[-1] %>%
mutate(across(everything(), function(x) (x - min(x))/(max(x) - min(x)))) %>%
stack() %>%
mutate(row = rep(cars$model, ncol(cars) - 1))
Finally, the plotting code would be something like:
ggplot(lines_df, aes(ind, values, group = row))
geom_line(color = "orange", alpha = 0.5)
geom_segment(data = axis_line_df, aes(x = ind, xend = ind, y = min, yend = max),
inherit.aes = FALSE)
geom_segment(data = axis_df, aes(x = xmin, xend = xmax, y = yval, yend = yval),
inherit.aes = FALSE)
geom_text(data = axis_df, aes(x = x_text, y = yval, label = breaks),
inherit.aes = FALSE)
geom_text(data = axis_line_df, aes(x = ind, y = 1.2, label = ind),
size = 6, inherit.aes = FALSE, check_overlap = TRUE, hjust = 1)
theme_void()
theme(plot.margin = margin(50, 20, 50, 20))
Created on 2021-10-24 by the reprex package (v2.0.0)