Home > Mobile >  set.seed in for loop
set.seed in for loop

Time:05-03

I'm doing some analysis and I had to impute some values. To do so, I write this chunk of code:

A)

set.seed(1)
for (i in 2:length (Dataset[-c(8,11)])) { 
      Dataset[,i]<-impute(Dataset[,i], "random")
}

[[The -c(8,11) is for two characters columns]]

This does not give me any error so I'm not asking for this, but: is it correct to put Set.seed(1) outside the for loop? Because the second time I ran this code the results (at the end of the analysis) were different. So I put Set.seed(1) inside the for loop, like this:

B)

for (i in 2:length (Dataset[-c(8,11)])) { 
      set.seed(1)
      Dataset[,i]<-impute(Dataset[,i], "random")
}

This gave me a reproducible result, but if I put outside again the set.seed, now the result is stuck as in B (when it was inside the for loop).

So I'm quite confused: why does this happen? What is wrong with the syntax? How can I effectively write a for loop with a set.seed to impute some values in the data set?

CodePudding user response:

First, your code does not do what you think it is doing. The problem is

for (i in 2:length(Dataset[-c(8,11)]))

You are not removing the columns from the loop, only from the length of the data frame. If the data frame has 20 columns, you will run the loop from column 2 to column 18 because you have just reduced the number of columns. Instead you should use i in 2:length(Dataset)[-c(8, 11)]. Since impute will jump over these columns if the are character data, you don't need to exclude them from the loop.

Second, we can test your question about the reproducibility of the results when the seed appears outside the loop. Here is a small example using the iris data set that comes with R:

data(iris)
set.seed(42)
Data <- iris[1:25, -5]
idx <- matrix(replicate(4, sample.int(25, 5)), 20)
idx <- cbind(idx, rep(1:4, each=5))
Data[idx] <- NA
head(Data)
#   Sepal.Length Sepal.Width Petal.Length Petal.Width
# 1           NA         3.5          1.4         0.2
# 2          4.9         3.0          1.4         0.2
# 3          4.7         3.2          1.3          NA
# 4           NA          NA          1.5          NA
# 5           NA         3.6           NA          NA
# 6          5.4         3.9          1.7         0.4

Now we impute the missing values three times:

library(Hmisc)
set.seed(1)
for (i in 1:4) {
    Data[, i] <- impute(Data[, i], "random")
}
Data[idx]
#  [1] 5.8 4.9 4.6 4.7 5.4 3.4 3.5 3.3 3.6 3.8 1.3 1.5 1.5 1.5 1.5 0.2 0.2 0.2 0.1 0.3
set.seed(1)
for (i in 1:4) {
    Data[, i] <- impute(Data[, i], "random")
}
Data[idx]
#  [1] 5.8 4.9 4.6 4.7 5.4 3.4 3.5 3.3 3.6 3.8 1.3 1.5 1.5 1.5 1.5 0.2 0.2 0.2 0.1 0.3
set.seed(1)
for (i in 1:4) {
    Data[, i] <- impute(Data[, i], "random")
}
Data[idx]
#  [1] 5.8 4.9 4.6 4.7 5.4 3.4 3.5 3.3 3.6 3.8 1.3 1.5 1.5 1.5 1.5 0.2 0.2 0.2 0.1 0.3

The imputed values are the same each time.

  • Related