I need to simulate 100 dataset from two normal distributions, such that each element has the point $(X_i, Y_i)$, where X_i and Y_i are a realization from the distribution. Each dataset needs to have 100 datapoints. But I have no clue how to do this.
The software I am using is Rstudio
CodePudding user response:
Like others, not sure what you are looking for. Is it possible what you want is something like this?
instance <- function () {list(rnorm(10))}
a <- replicate(3,instance()) # only 3 lists to show what's happening w/o clutter
# returns a list of 3 sets of 10 z-scores.
You can create a list of whatever length you want with replicate() and you can process each list member (i.e., each instance ) with lapply() or sapply() as follows:
sapply(a,FUN=mean) # Note NO parentheses here for the function.
# Obviously you can create/use your own function.
CodePudding user response:
You can generate random numbers from a variety of distributions. Run ?distributions
in the R console to see the help on this. If you want to get 100 numbers from a normal distribution with mean 50 and standard deviation 10, you can run rnorm(100, 50, 10)
, equivalent to rnorm(n = 100, mean = 50, sd = 10)
.
The simplest way to make 100 datasets where all the x's and y's are pulled from the same distributions would be to pull those in one step:
# Some parameters
n_sets = 100
n_pts = 100
x_m = 50
x_sd = 20
y_m = 20
y_sd = 10
# Make the datasets all at once
df <- data.frame(dataset_id = rep(1:n_sets, each = n_pts),
#repeat the sequence of 1 to n_sets, each # n_pts times
x = rnorm(n_sets*n_pts, x_m, x_sd),
y = rnorm(n_sets*n_pts, y_m, y_sd))
That will generate a 10,000 row table, where the dataset_id
column distinguishes each data set.
We could plot those to get a sense of the result:
library(ggplot2)
ggplot(df, aes(x,y))
geom_point(size = 0.2)
coord_equal()
facet_wrap(~dataset_id)