Home > Enterprise >  The meaning of the independent variables in prcomp formula
The meaning of the independent variables in prcomp formula

Time:04-06

I'm reading "Applied Predictive Modeling" by Kuhn and Johnson. In the code for chapter 3.3 Data Transformations for Multiple Predictors, there is a code snippet:

pr <- prcomp(~ AvgIntenCh1   EntropyIntenCh1,
             data = segTrainTrans,
             scale. = TRUE)

Full code example here.

In the documentation for prcomp, I couldn't find much on how this first parameter is even interpreted (this ~ AvgIntenCh1 EntropyIntenCh1 formula). It just says:

 formula: a formula with no response variable, referring only to
          numeric variables.

How is that formula used by the prcomp call, what does it mean?

CodePudding user response:

I think it's just an alternative way of specifying which variables to run the PCA on. It seems that it's equivalent to just specifying x instead of a formula.

prcomp(iris[,-5])
#> Standard deviations (1, .., p=4):
#> [1] 2.0562689 0.4926162 0.2796596 0.1543862
#> 
#> Rotation (n x k) = (4 x 4):
#>                      PC1         PC2         PC3        PC4
#> Sepal.Length  0.36138659 -0.65658877  0.58202985  0.3154872
#> Sepal.Width  -0.08452251 -0.73016143 -0.59791083 -0.3197231
#> Petal.Length  0.85667061  0.17337266 -0.07623608 -0.4798390
#> Petal.Width   0.35828920  0.07548102 -0.54583143  0.7536574

prcomp(~Sepal.Length   Sepal.Width   Petal.Length   Petal.Width, data = iris)
#> Standard deviations (1, .., p=4):
#> [1] 2.0562689 0.4926162 0.2796596 0.1543862
#> 
#> Rotation (n x k) = (4 x 4):
#>                      PC1         PC2         PC3        PC4
#> Sepal.Length  0.36138659 -0.65658877  0.58202985  0.3154872
#> Sepal.Width  -0.08452251 -0.73016143 -0.59791083 -0.3197231
#> Petal.Length  0.85667061  0.17337266 -0.07623608 -0.4798390
#> Petal.Width   0.35828920  0.07548102 -0.54583143  0.7536574

Created on 2022-04-05 by the reprex package (v2.0.1)

  •  Tags:  
  • r pca
  • Related