Home > Enterprise >  issue with loop increments of lower than 1 when simulating data
issue with loop increments of lower than 1 when simulating data

Time:09-06

I'm trying to simulate data utulizing a for loop and storing it in some matrix with the following code:

m <- matrix(nrow = 500 , ncol = 7)

for(i in seq(from =  1, to = 4, by = 0.5)){
  
    a <- 1 * i   rnorm(n = 500, mean = 0, sd = 1)
    m[, i] <- a
}

But instead of giving me 7 columns with means of roughly 1, 1.5, 2, 2.5, 3, 3.5 and 4. matrix m contains 4 columns with means of roughly 1.5, 2.5, 3.5 and 4 and 3 columns of NA values.

If i change the increments to 1 and run the below code, everything behaves as expected so the issue seems to be with the increments, but i cant figure out what i should do differently, help would be most appreciated.

m <- matrix(nrow = 500 , ncol = 7)

for(i in seq(from =  1, to = 7, by = 1)){
  
    a <- 1 * i   rnorm(n = 500, mean = 0, sd = 1)
    m[, i] <- a
}

CodePudding user response:

Column indices must be integers. In your case, you try to select column 1.5 which is not possible. You can fix this by some simple calculations ((i * 2) - 1)

# reduce number of rows for showcase
n <- 100
m <- matrix(nrow = n , ncol = 7)

for(i in seq(from =  1, to = 4, by = 0.5)){
  
  # NOTE: 1*i does not change anything
  a <- 1*i   rnorm(n = n, mean = 0, sd = 1)
  
  # make column index integerish
  m[, (i * 2) - 1] <- a
}
m[1:5, ]
#>             [,1]      [,2]     [,3]     [,4]     [,5]     [,6]     [,7]
#> [1,]  1.15699467 0.8917952 1.999899 2.330557 4.502607 4.469957 5.687460
#> [2,] -1.13634309 1.5394771 1.700148 1.669329 2.124019 3.472836 3.513351
#> [3,]  2.08584731 1.0591743 2.866186 3.192953 3.984286 3.593902 3.983265
#> [4,]  0.02211767 2.2222376 2.055832 2.927851 2.846376 3.411725 3.742966
#> [5,]  0.49167319 2.2244472 2.190050 3.525931 2.841522 5.722172 4.797856

colMeans(m)
#> [1] 0.8537568 1.6805235 1.9907633 2.6434843 2.8651140 3.5499583 3.9757984

CodePudding user response:

When you use rnorm, it actually allows vectorzied input for the mean value, so you can try the code below (but you should use matrix to fit the obtained output into the desired dimensions of your output matrix)

nr <- 500
nc <- 7
m <- t(matrix(rnorm(nr * nc, seq(1, 4, 0.5), 1), nc, nr))

where you can see, for example

> m[1:5, ]
           [,1]      [,2]      [,3]     [,4]     [,5]     [,6]     [,7]
[1,]  3.2157776 0.3805689 0.7550255 2.508356 3.567479 2.597378 4.122201
[2,]  0.8634009 0.4887092 2.5655513 1.710756 2.377790 3.733045 4.199812
[3,] -0.1786419 2.4471083 1.2138140 3.090687 2.763694 3.471715 4.676037
[4,]  1.2492511 2.3480447 2.2180039 1.965656 1.505342 3.832380 4.086075
[5,] -0.1301543 1.7463687 1.2467769 2.649525 4.795677 2.606623 4.318468

> colMeans(m)
[1] 0.901146 1.476423 1.900147 2.567463 2.996918 3.468140 4.025929

CodePudding user response:

You're using i as a row index, but i has non-integer values. Only integers can be used for indexing a matrix/df. When i is, say, 1.5 but you try to use it in the m[,i] expression, it gets forced into an integer and rounded down to 1, so the first 2 runs of your loop overwrite each other (and the 3rd and 4th, etc.).

You could simply use your second code and replace 1*i with 0.5 0.5*i:

m <- matrix(nrow = 5000 , ncol = 7)

for(i in seq(from =  1, to = 7, by = 1)){
  
    a <- 0.5   0.5*i   rnorm(n = 5000, mean = 0, sd = 1)
    m[,i] <- a
}

However, it may be better to use the params of the rnorm function to generate values with a specified mean/sd: currently, you are drawing from a normal distribution centered around 0 then shifting it sideways; you could simply tell it to use the mean you actually want.

m <- matrix(nrow = 5000 , ncol = 7)

for(i in seq(from =  1, to = 7, by = 1)){
  
    m[,i] <- rnorm(n = 5000, mean = 0.5   0.5*i, sd = 1)
}
  • Related