R: Creating Random Samples From Entries in Neighboring Row-CodePudding

I am working with the R programming language.

I have the following data set:

my_data = data.frame(id = c(1,2,3,4,5), n = c(15,3,51,8,75))

I want to create a new variable that generates a single random integer for each row based on the corresponding value of "n". I tried to do this with the following code:

my_data$rand = sample.int(my_data$n,1)

But this is not working (the same random number is repeated 5 times).

I also tried to define a function to this:

my_function <- function(x){sample.int(x,1)}

transform(my_data, new_column= my_function(my_data$n) )

But this is also not working (the same random number is again repeated 5 times)..

In the end, I am trying to achieve something like this :

my_data$rand = c(sample.int(15,1), sample.int(3,1), sample.int(51,1), sample.int(8,1), sample.int(75,1))

Can someone please show me how to do this for larger datasets without having to manually specify each "sample.int" command?

Thanks!

CodePudding user response：

When you say "based on value of n" what do you mean by that exactly? Based on n how?

Guess#1: at each row, you want to draw one random number with possible values being 1 to n. Guess#2: at each row, you want to draw n random numbers for possible values between 0 and 1.

Second option is harder, but option #1 can be done with a loop:

my_data = data.frame(id = c(1,2,3,4,5), n = c(15,3,51,8,75))
my_data$rand = NA

set.seed(123)
for(i in 1:nrow(my_data)){
  my_data$rand[i] = sample(1:(my_data$n[i]), size = 1)
}

my_data

  id  n rand
1  1 15   15
2  2  3    3
3  3 51   51
4  4  8    6
5  5 75   67

CodePudding user response：

We can use sapply to go over all rows in my_data, and generate one sample.int per iteration.

my_data$rand <- sapply(1:nrow(my_data), function(x) sample.int(my_data[x, 2], 1))

  id  n rand
1  1 15    7
2  2  3    2
3  3 51   28
4  4  8    6
5  5 75    9

CodePudding user response：

You can do this efficiently by a single call to runif(), multiplying by n, and rounding up:

transform(my_data, rand = ceiling(runif(n) * n))

  id  n rand
1  1 15   13
2  2  3    1
3  3 51   41
4  4  8    1
5  5 75    9