Home > database >  R Studio Aborting with large dataset?
R Studio Aborting with large dataset?

Time:06-30

I'm comparing mass spec peaks to create a molecular dendrogram in R Studio. I have 88,336 elements which comprise 48.2 MB total memory. I am running this on a desktop with 64 GB RAM and a Intel(R) Core(TM) 19-9900k CPU @ 3.60 GHz.

I am calculating the distances of the peaks in the igraph network, 'net' net.dist <- distances(net) and the computer crashes saying "R session aborted. R encountered a fatal error. The session was terminated."

I don't know enough about computers to remedy this issue. I assume it's because there are so many peaks to calculate, but I also assume the desktop should be able to handle them?

My R Studio session is only at 7.56 GiB during the crashes. The C drive has 513 GB free.

Thank you

CodePudding user response:

Currently R/igraph can only handle matrices with at most 2^31 - 1 elements, and will fail without warning with more. Future versions will be much more robust, and won't crash. For a graph with n vertices, the distance matrix will have n*n elements. Thus the full distance matrix can't be computed for graphs with more than n = 46340 vertices.

You can, however, compute the distance matrix piece by piece, by setting the v argument of distances to only part of the vertex set.


Note that the limitation of no more than 2^31 - 1 matrix elements comes from R's use of 32-bit integers, even on 64-bit platforms. Also note that storing this many elements already takes up 16 GB of memory. The full distance matrix for 88336 vertices would take up 59 GB of memory, and would take a rather long time to compute.

  • Related