Home > Software engineering >  R: Creating Random Samples From Entries in Neighboring Row
R: Creating Random Samples From Entries in Neighboring Row

Time:06-04

I am working with the R programming language.

I have the following data set:

my_data = data.frame(id = c(1,2,3,4,5), n = c(15,3,51,8,75))

I want to create a new variable that generates a single random integer for each row based on the corresponding value of "n". I tried to do this with the following code:

my_data$rand = sample.int(my_data$n,1)

But this is not working (the same random number is repeated 5 times).

I also tried to define a function to this:

my_function <- function(x){sample.int(x,1)}

transform(my_data, new_column= my_function(my_data$n) )

But this is also not working (the same random number is again repeated 5 times)..

In the end, I am trying to achieve something like this :

my_data$rand = c(sample.int(15,1), sample.int(3,1), sample.int(51,1), sample.int(8,1), sample.int(75,1))

Can someone please show me how to do this for larger datasets without having to manually specify each "sample.int" command?

Thanks!

CodePudding user response:

When you say "based on value of n" what do you mean by that exactly? Based on n how?

Guess#1: at each row, you want to draw one random number with possible values being 1 to n. Guess#2: at each row, you want to draw n random numbers for possible values between 0 and 1.

Second option is harder, but option #1 can be done with a loop:

my_data = data.frame(id = c(1,2,3,4,5), n = c(15,3,51,8,75))
my_data$rand = NA

set.seed(123)
for(i in 1:nrow(my_data)){
  my_data$rand[i] = sample(1:(my_data$n[i]), size = 1)
}

my_data
  id  n rand
1  1 15   15
2  2  3    3
3  3 51   51
4  4  8    6
5  5 75   67

CodePudding user response:

We can use sapply to go over all rows in my_data, and generate one sample.int per iteration.

my_data$rand <- sapply(1:nrow(my_data), function(x) sample.int(my_data[x, 2], 1))

  id  n rand
1  1 15    7
2  2  3    2
3  3 51   28
4  4  8    6
5  5 75    9

CodePudding user response:

You can do this efficiently by a single call to runif(), multiplying by n, and rounding up:

transform(my_data, rand = ceiling(runif(n) * n))

  id  n rand
1  1 15   13
2  2  3    1
3  3 51   41
4  4  8    1
5  5 75    9
  • Related