Home > other >  How to create a matrix with huge number of rows
How to create a matrix with huge number of rows

Time:12-25

I want to create a big dataframe or a matrix. the dimension of it is: col is 49 and row is 35886996700 When I am trying to create a matrix its giving me an error:

data <- data.frame(matrix(NA,    # Create empty data frame                                      nrow = (length(genes_union)*length(snp_union)),
                          ncol = col_length))
Error in matrix(NA, nrow = (length(genes_union) * length(snp_union)),  :
  invalid 'nrow' value (too large or NA)
In addition: Warning message:
In length(genes_union) * length(snp_union) :
  NAs produced by integer overflow

I also tried to use big.matrix

z <- big.matrix(,nrow=35886996700,ncol=49)

Error in big.matrix(, nrow = 35886996700, ncol = 49) :
  Error: memory could not be allocated for instance of type big.matrix

Is there any way to solve this problem so that I can create a matrix with these many rows.

Basically my final output matrix should look like this: G represent gene and RS represent Ids and T represent different tissues.

       T1 T2 T3 ...Tn
G1RS1
G1RS2
G1RSn
G2RS1
G2RS2
G2RSN
GnRSn

CodePudding user response:

dataframe=data.frame(matrix(ncol = 10000, nrow = 5000))
print (dataframe)

how much large is 'nrow' here? I tried with the static value I hope it will help you.

CodePudding user response:

I tried to generate a vector of 0's with length 35886996700 * 49:

x1 <- 35886996700
x1
[1] 3.5887e 10
x2 <- 49
vec1 <- rep(0, x1 * x2)
Error: cannot allocate vector of size 13101.6 Gb

I can't see any way to process/manage 13,101GB of data. A big question is if the matrix is extremely sparse. Then you may be able to store the data in much more compact sparse format. If sparse storage is feasible, see the Matrix package in base R: https://www.rdocumentation.org/packages/Matrix/versions/1.5-3

  • Related