Home > Back-end >  How to handle massive outliers in PCA Bi-Plots?
How to handle massive outliers in PCA Bi-Plots?

Time:10-25

Given I have the code that computes the following bi-plot

countries.coord <- predict(res.pca, newdata = test_data_PCA)    #Countries
p <- fviz_pca_ind(res.pca, repel = TRUE)
fviz_add(p, countries.coord, color ="blue")
cos2 <- function(countries.coord, d2){return(countries.coord^2/d2)}
ind.cos2 <- apply(countries.coord, 2, cos2, d2)
ind.cos2[, 1:3]

The problem with the massive outlier (USA) is seeming that it will distort visibility in my report. Is there any way how I can make it look more visible in a sense that will also depict all other countries?

enter image description here

CodePudding user response:

This isn't an actual answer as we don't have the data to run an example of your code, but below are the most common solutions for this kind of problem, and you decide which of those you prefer. Those that study data visualization can give better comentary on the pros and cons of each.

1. Use a discontinuous axis:

You remove the blank spaces by cutting your y (and x) axis, getting something like this:

enter image description here

Cons: its very arbitrary, and can be used for data manipulation
Pros: makes a very concise graph, without the change of scale

How to do it:

enter image description here

Cons: has the same problems of 1, and is less compact.
Pros: also doesn't affect the linearity, and is less arbitrary than 1.

How to do it:

https://www.j4s8.de/post/2018-01-15-broken-axis-with-ggplot2/

  •  Tags:  
  • rpca
  • Related