I have to create a synthetic dataset with multiple variables and >50 observations. I have selected to create a synthetic data for an oil field which has 10 wells and five producing reservoirs. So my dataframe would have 3 variables - "Well ID","Reservoir Name" and "Reservoir Quality".
So, I want to create a dataframe in which for each well, I would have 5 reservoirs, and for each reservoir, I would have 3 rock qualities - "Sand","Shale", and "Cement".
I tried for 2 variables in a crude way -
well1 <- data.frame(Wells = rep(1, 5), Reservoirs = c("A", "B", "C", "D","E"))
well2 <- data.frame(Wells = rep(2, 5), Reservoirs = c("A", "B", "C", "D","E"))
.
.
static_data <- rbind(well1,well2,...)
Now, I am struggling how to add the 3rd variable, and is there any smarter way of doing this? I
I am looking for something like this -
Well | Reservoir | Rock Quality |
---|---|---|
1 | A | Sand |
1 | A | Shale |
1 | A | Cement |
1 | B | Sand |
1 | B | Shale |
1 | B | Cement |
CodePudding user response:
The package data.table has a cross-join function that gives what I think you need.
library(data.table)
CJ(a=c(1,2,3), b=c('a', 'b'), c=c('Y', 'Z'))