Home > database >  Replace randomly 30% of non-zero elements in a matrix to 0 using R
Replace randomly 30% of non-zero elements in a matrix to 0 using R

Time:07-11

I have a huge matrix say M1 and I want to create a new matrix M2, where M2 will be a copy of M1 and with a 30% substitution of non-zero values of M1 to 0.

Please let me know how to work around this.

CodePudding user response:

sample from which(m > 0), I use <- 999 to demonstrate, just replace with <- 0. The which gives the indices of the non-zeroes, and we sample 30% from them.

m
#       [,1] [,2] [,3] [,4] [,5]
#  [1,]    9    4    9    7    3
#  [2,]    9    7    1    8    4
#  [3,]    2    9    9    3    0
#  [4,]    8    2    9    6    9
#  [5,]    6    4    0    0    4
#  [6,]    5    9    5    8    9
#  [7,]    7    9    3    0    8
#  [8,]    1    1    9    2    6
#  [9,]    6    4    4    9    9
# [10,]    7    5    8    6    6

m[sample(which(m > 0), length(m)*.3)] <- 999
m
#       [,1] [,2] [,3] [,4] [,5]
#  [1,]    9    4  999    7  999
#  [2,]  999    7    1    8    4
#  [3,]  999    9  999    3    0
#  [4,]  999    2  999    6  999
#  [5,]  999    4    0    0    4
#  [6,]  999    9    5  999    9
#  [7,]    7    9  999    0    8
#  [8,]    1    1    9  999    6
#  [9,]    6    4  999  999    9
# [10,]    7    5    8    6    6

sum(m == 999)/length(m)  ## check
# [1] 0.3

Data:

set.seed(42)
m <- matrix(trunc(runif(50, 0, 1)*10), 10, 5)

CodePudding user response:

The trick is to filter in matrix by non-zero elements like this:

M1<-matrix(rnorm(36),nrow=6)
M2 <- M1
M2
#>            [,1]       [,2]       [,3]       [,4]       [,5]       [,6]
#> [1,]  1.1450903 -1.3354652  1.7408616  2.4104801  1.0190374 -0.4452658
#> [2,] -0.6193147  0.6247960  0.8880114  0.2063487  1.4564834 -1.6591764
#> [3,] -1.4440763 -0.1740776  2.1646262 -1.3795811 -0.2231788 -2.1524281
#> [4,]  1.0929878  2.4982284 -1.5304989  1.0759637  0.2585276  0.3428240
#> [5,] -1.4013196 -0.3208720  0.8025738 -0.7251131  0.1134538 -1.2704551
#> [6,] -0.7992393  0.5610579  2.0940327  1.1937530 -1.5585291 -1.0766868
M2[sample(which(M2 > 0), length(M2[M2!=0])*0.3, replace = FALSE)] = 0
M2
#>            [,1]       [,2]       [,3]       [,4]       [,5]       [,6]
#> [1,]  1.1450903 -1.3354652  0.0000000  0.0000000  0.0000000 -0.4452658
#> [2,] -0.6193147  0.0000000  0.8880114  0.2063487  0.0000000 -1.6591764
#> [3,] -1.4440763 -0.1740776  2.1646262 -1.3795811 -0.2231788 -2.1524281
#> [4,]  0.0000000  0.0000000 -1.5304989  1.0759637  0.2585276  0.0000000
#> [5,] -1.4013196 -0.3208720  0.8025738 -0.7251131  0.0000000 -1.2704551
#> [6,] -0.7992393  0.5610579  0.0000000  1.1937530 -1.5585291 -1.0766868

Created on 2022-07-11 by the reprex package (v2.0.1)

Option jay mentioned in comments

M1<-matrix(rnorm(36),nrow=6)
M2 <- M1
M2
#>            [,1]       [,2]       [,3]       [,4]       [,5]       [,6]
#> [1,]  0.2704036 1.66744279  1.2249968  0.7105401  0.2930494  0.3019442
#> [2,]  0.6701630 0.23103360  0.3433342 -0.9176159  0.2890372 -1.3139269
#> [3,] -0.7845245 0.64272243  0.3152463  0.2794443  0.3818046 -1.7073781
#> [4,]  1.3994086 0.04721819 -0.1364107 -0.2889496  1.7605232  1.0270522
#> [5,]  0.8934011 0.53878503 -1.6008799 -0.4516311 -1.1541206 -1.3896758
#> [6,]  0.3205831 1.15597968 -0.4654826 -1.3999804 -1.0597505  0.2982040
i <- M2 != 0 
M2[i] <- replace(M2[i], sample(sum(i), sum(i)*.3), 999)
M2
#>             [,1]       [,2]        [,3]        [,4]        [,5]        [,6]
#> [1,]   0.2704036 1.66744279   1.2249968 999.0000000 999.0000000   0.3019442
#> [2,]   0.6701630 0.23103360   0.3433342  -0.9176159   0.2890372 999.0000000
#> [3,]  -0.7845245 0.64272243 999.0000000   0.2794443   0.3818046  -1.7073781
#> [4,]   1.3994086 0.04721819  -0.1364107  -0.2889496 999.0000000   1.0270522
#> [5,] 999.0000000 0.53878503 999.0000000  -0.4516311  -1.1541206 999.0000000
#> [6,]   0.3205831 1.15597968  -0.4654826  -1.3999804 999.0000000 999.0000000

Created on 2022-07-11 by the reprex package (v2.0.1)

First answer

You can sample through your matrix, where you can specify the number of elements you want to replace to 0 by saying that the amount of elements is 30% of your matrix. You can use the following code:

M1<-matrix(rnorm(36),nrow=6)
M1
#>            [,1]       [,2]        [,3]       [,4]        [,5]       [,6]
#> [1,] -1.4181422 -0.1675572 -0.07126163 -0.2250808  0.06538817  0.7096829
#> [2,]  0.1265111  0.6535900 -0.81718699  0.1660550 -0.84969221  0.5222353
#> [3,] -0.5860745 -0.7130558  0.80823046  0.5601937  2.06109461 -1.4000195
#> [4,] -1.8507512 -0.2643667  0.62158830 -1.0455708 -1.28048923 -0.3291040
#> [5,] -1.5950047  0.6611776  1.19810322 -0.8927425 -0.70925100 -1.8455213
#> [6,] -1.2737187 -1.3739572 -0.92623331 -0.1034901  1.12354331 -0.6559306
M1[sample(1:length(M1), length(M1)*0.3, replace = FALSE)] <- 0
M2 <- M1
M2
#>            [,1]       [,2]       [,3]       [,4]       [,5]       [,6]
#> [1,] -1.4181422 -0.1675572  0.0000000  0.0000000  0.0000000  0.0000000
#> [2,]  0.1265111  0.6535900 -0.8171870  0.1660550 -0.8496922  0.0000000
#> [3,] -0.5860745  0.0000000  0.8082305  0.5601937  2.0610946 -1.4000195
#> [4,] -1.8507512 -0.2643667  0.6215883  0.0000000 -1.2804892 -0.3291040
#> [5,] -1.5950047  0.6611776  0.0000000 -0.8927425 -0.7092510 -1.8455213
#> [6,] -1.2737187 -1.3739572  0.0000000  0.0000000  1.1235433 -0.6559306

Created on 2022-07-11 by the reprex package (v2.0.1)

  • Related