Creating 100 datasets with 100 datapoints each in R^2-CodePudding

I need to simulate 100 dataset from two normal distributions, such that each element has the point $(X_i, Y_i)$, where X_i and Y_i are a realization from the distribution. Each dataset needs to have 100 datapoints. But I have no clue how to do this.

The software I am using is Rstudio

CodePudding user response：

Like others, not sure what you are looking for. Is it possible what you want is something like this?

instance <- function () {list(rnorm(10))}
a <- replicate(3,instance()) # only 3 lists to show what's happening w/o clutter
                             # returns a list of 3 sets of 10 z-scores.

You can create a list of whatever length you want with replicate() and you can process each list member (i.e., each instance ) with lapply() or sapply() as follows:

sapply(a,FUN=mean) # Note NO parentheses here for the function.
                   # Obviously you can create/use your own function.

CodePudding user response：

You can generate random numbers from a variety of distributions. Run ?distributions in the R console to see the help on this. If you want to get 100 numbers from a normal distribution with mean 50 and standard deviation 10, you can run rnorm(100, 50, 10), equivalent to rnorm(n = 100, mean = 50, sd = 10).

The simplest way to make 100 datasets where all the x's and y's are pulled from the same distributions would be to pull those in one step:

# Some parameters
n_sets = 100
n_pts  = 100
x_m = 50
x_sd = 20
y_m = 20
y_sd = 10

# Make the datasets all at once
df <- data.frame(dataset_id = rep(1:n_sets, each = n_pts),
                              #repeat the sequence of 1 to n_sets, each # n_pts times
                 x = rnorm(n_sets*n_pts, x_m, x_sd),
                 y = rnorm(n_sets*n_pts, y_m, y_sd))

That will generate a 10,000 row table, where the dataset_id column distinguishes each data set.

We could plot those to get a sense of the result:

library(ggplot2)
ggplot(df, aes(x,y))  
  geom_point(size = 0.2)  
  coord_equal()  
  facet_wrap(~dataset_id)