I have a dataset like this called df
head(df[, 1:3])
ratio | P | T | H | S | p1 | p2 | PM10 | CO2 | B | G | Month | Year |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0.5 | 89 | -7 | 98 | 133 | 0 | 40 | 50 | 30 | 3 | 20 | 1 | 2019 |
0.5 | 55 | 4 | 43 | 43 | 30 | 30 | 40 | 32 | 1 | 15 | 1 | 2019 |
0.85 | 75 | 4 | 63 | 43 | 30 | 30 | 42 | 32 | 1 | 18 | 1 | 2019 |
I would like to do a principal component analysis to reduced number of variables for regression analysis. I gave that code
library(factoextra)
df.pca <- prcomp(df, scale = TRUE)
But I got this error message and for that reason I was not able to continue
Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric
What I am doing wrong?
CodePudding user response:
prcomp()
will assume that every column in the object you are passing to it should be used in the analysis. You'll need to drop any non-numeric columns, as well as any numeric columns that should not be used in the PCA.
library(factoextra)
# Example data
df <- data.frame(
x = letters,
y1 = rbinom(26,1,0.5),
y2 = rnorm(26),
y3 = 1:26,
id = 1:26
)
# Reproduce your error
prcomp(df)
#> Error in colMeans(x, na.rm = TRUE): 'x' must be numeric
# Remove all non-numeric columns
df_nums <- df[sapply(df, is.numeric)]
# Conduct PCA - works but ID column is in there!
prcomp(df_nums, scale = TRUE)
#> Standard deviations (1, .., p=4):
#> [1] 1.445005e 00 1.039765e 00 9.115092e-01 1.333315e-16
#>
#> Rotation (n x k) = (4 x 4):
#> PC1 PC2 PC3 PC4
#> y1 0.27215111 -0.5512026 -0.7887391 0.000000e 00
#> y2 0.07384194 -0.8052981 0.5882536 4.715914e-16
#> y3 -0.67841033 -0.1543868 -0.1261909 -7.071068e-01
#> id -0.67841033 -0.1543868 -0.1261909 7.071068e-01
# Remove ID
df_nums$id <- NULL
# Conduct PCA without ID - success!
prcomp(df_nums, scale = TRUE)
#> Standard deviations (1, .., p=3):
#> [1] 1.1253120 0.9854030 0.8733006
#>
#> Rotation (n x k) = (3 x 3):
#> PC1 PC2 PC3
#> y1 -0.6856024 0.05340108 -0.7260149
#> y2 -0.4219813 -0.84181344 0.3365738
#> y3 0.5931957 -0.53712052 -0.5996836