Home > Software design >  calculate the reconstruction error as the difference between the original and the reconstructed matr
calculate the reconstruction error as the difference between the original and the reconstructed matr

Time:10-03

I am currently in an online class in genomics, coming in as a wetlab physician, so my statistical knowledge is not the best. Right now we are working on PCA and SVD in R. I got a big matrix:

head(mat)
                ALL_GSM330151.CEL ALL_GSM330153.CEL ALL_GSM330154.CEL ALL_GSM330157.CEL ALL_GSM330171.CEL ALL_GSM330174.CEL ALL_GSM330178.CEL ALL_GSM330182.CEL
ENSG00000224137          5.326553          3.512053          3.455480          3.472999          3.639132          3.391880          3.282522          3.682531
ENSG00000153253          6.436815          9.563955          7.186604          2.946697          6.949510          9.095092          3.795587         11.987291
ENSG00000096006          6.943404          8.840839          4.600026          4.735104          4.183136          3.049792          9.736803          3.338362
ENSG00000229807          3.322499          3.263655          3.406379          9.525888          3.595898          9.281170          8.946498          3.473750
ENSG00000138772          7.195113          8.741458          6.109578          5.631912          5.224844          3.260912          8.889246          3.052587
ENSG00000169575          7.853829         10.428492         10.512497         13.041571         10.836815         11.964498         10.786381         11.953912 

Those are just the first few columns and rows, it has 60 columns and 1000 rows. Columns are cancer samples, rows are genes

The task is to: removing the eigenvectors and reconstructing the matrix using SVD, then we need to calculate the reconstruction error as the difference between the original and the reconstructed matrix. HINT: You have to use the svd() function and equalize the eigenvalue to $0$ for the component you want to remove.

I have been all over google, but can't find a way to solve this task, which might be because I don't really get the question itself.

so i performed SVD on my matrix m:

d <- svd(mat)

Which gives me 3 matrices (Eigenassays, Eigenvalues and Eigenvectors), which i can access using d$u and so on.

Can anyone give me an hint on how to proceed further? How do I equalize the eigenvalue and ultimately calculate the error? Thanks for any help!

CodePudding user response:

https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/svd the decomposition expresses your matrix mat as a product of 3 matrices

mat = d$u x diag(d$d) x t(d$v)

so first confirm you are able to do the matrix multiplications to get back mat

once you are able to do this, set the last couple of elements of d$d to zero before doing the matrix multiplication

CodePudding user response:

It helps to create a function that handles the singular values.

Here, for instance, is one that zeros out any singular value that is too small compared to the largest singular value:

zap <- function(d, digits = 3) ifelse(d < 10^(-digits) * max(abs(d))), 0, d)

Although mathematically all singular values are guaranteed non-negative, numerical issues with floating point algorithms can--and do--create negative singular values, so I have prophylactically wrapped the singular values in a call to abs.

Apply this function to the diagonal matrix in the SVD of a matrix X and reconstruct the matrix by multiplying the components:

X. <- with(svd(X), u %*% diag(zap(d)) %*% t(v))

There are many ways to assess the reconstruction error. One is the Frobenius norm of the difference,

sqrt(sum((X - X.)^2))
  • Related