I'm reading "Applied Predictive Modeling" by Kuhn and Johnson. In the code for chapter 3.3 Data Transformations for Multiple Predictors, there is a code snippet:
pr <- prcomp(~ AvgIntenCh1 EntropyIntenCh1,
data = segTrainTrans,
scale. = TRUE)
Full code example here.
In the documentation for prcomp
, I couldn't find much on how this first parameter is even interpreted (this ~ AvgIntenCh1 EntropyIntenCh1
formula). It just says:
formula: a formula with no response variable, referring only to
numeric variables.
How is that formula used by the prcomp
call, what does it mean?
CodePudding user response:
I think it's just an alternative way of specifying which variables to run the PCA on. It seems that it's equivalent to just specifying x
instead of a formula
.
prcomp(iris[,-5])
#> Standard deviations (1, .., p=4):
#> [1] 2.0562689 0.4926162 0.2796596 0.1543862
#>
#> Rotation (n x k) = (4 x 4):
#> PC1 PC2 PC3 PC4
#> Sepal.Length 0.36138659 -0.65658877 0.58202985 0.3154872
#> Sepal.Width -0.08452251 -0.73016143 -0.59791083 -0.3197231
#> Petal.Length 0.85667061 0.17337266 -0.07623608 -0.4798390
#> Petal.Width 0.35828920 0.07548102 -0.54583143 0.7536574
prcomp(~Sepal.Length Sepal.Width Petal.Length Petal.Width, data = iris)
#> Standard deviations (1, .., p=4):
#> [1] 2.0562689 0.4926162 0.2796596 0.1543862
#>
#> Rotation (n x k) = (4 x 4):
#> PC1 PC2 PC3 PC4
#> Sepal.Length 0.36138659 -0.65658877 0.58202985 0.3154872
#> Sepal.Width -0.08452251 -0.73016143 -0.59791083 -0.3197231
#> Petal.Length 0.85667061 0.17337266 -0.07623608 -0.4798390
#> Petal.Width 0.35828920 0.07548102 -0.54583143 0.7536574
Created on 2022-04-05 by the reprex package (v2.0.1)