I am working with the R programming language.
I have the following data set:
my_data = data.frame(id = c(1,2,3,4,5), n = c(15,3,51,8,75))
I want to create a new variable that generates a single random integer for each row based on the corresponding value of "n". I tried to do this with the following code:
my_data$rand = sample.int(my_data$n,1)
But this is not working (the same random number is repeated 5 times).
I also tried to define a function to this:
my_function <- function(x){sample.int(x,1)}
transform(my_data, new_column= my_function(my_data$n) )
But this is also not working (the same random number is again repeated 5 times)..
In the end, I am trying to achieve something like this :
my_data$rand = c(sample.int(15,1), sample.int(3,1), sample.int(51,1), sample.int(8,1), sample.int(75,1))
Can someone please show me how to do this for larger datasets without having to manually specify each "sample.int" command?
Thanks!
CodePudding user response:
When you say "based on value of n
" what do you mean by that exactly? Based on n
how?
Guess#1: at each row, you want to draw one random number with possible values being 1 to n
.
Guess#2: at each row, you want to draw n
random numbers for possible values between 0 and 1.
Second option is harder, but option #1 can be done with a loop:
my_data = data.frame(id = c(1,2,3,4,5), n = c(15,3,51,8,75))
my_data$rand = NA
set.seed(123)
for(i in 1:nrow(my_data)){
my_data$rand[i] = sample(1:(my_data$n[i]), size = 1)
}
my_data
id n rand
1 1 15 15
2 2 3 3
3 3 51 51
4 4 8 6
5 5 75 67
CodePudding user response:
We can use sapply
to go over all rows in my_data
, and generate one sample.int
per iteration.
my_data$rand <- sapply(1:nrow(my_data), function(x) sample.int(my_data[x, 2], 1))
id n rand
1 1 15 7
2 2 3 2
3 3 51 28
4 4 8 6
5 5 75 9
CodePudding user response:
You can do this efficiently by a single call to runif()
, multiplying by n
, and rounding up:
transform(my_data, rand = ceiling(runif(n) * n))
id n rand
1 1 15 13
2 2 3 1
3 3 51 41
4 4 8 1
5 5 75 9