I would like to simulate flight numbers by generating flight numbers from a simulation table (table_simul), by taking an observation table as a basis.
table_simul <- data.table(
date_f = c("2020-01-01","2020-01-02","2020-01-03","2020-01-03"),
city = c("Paris","Paris","London", "Berlin")
)
table_obs <- data.table(
city = c("Paris","Paris","Paris","London","London", "Berlin"),
flight = c(1,2,7,4,5,14),
weight = c(0.33,0.33,0.33,0.5,0.5,1)
)
Used data:
Table simul:
date city
2020-01-01 Paris
2020-01-01 Paris
2020-01-01 London
2020-01-01 Berlin
Table obs
---------
city flight weight
Paris 1 0.33
Paris 2 0.33
Paris 7 0.33
London 4 0.5
London 5 0.5
Berlin 14 1
Expected result:
date_f city flight
2020-01-01 Paris 2
2020-01-02 Paris 2
2020-01-03 London 4
2020-01-03 Berlin 14
I would like to use the data table package because the volume of data is very large. Here's what I tried but didn't work:
get_flight_sample <- function(param_city){
table_simul[city==param_city]
res <- sample(table_obs$flight,1, replace=T, prob = table_obs$weight)
}
res <- table_simul[,.(flight = get_flight_sample(city))]
CodePudding user response:
Maybe use a helper function to sample from table_obs
and then join with table_simul
f <- function(i, ...) {
if(length(i) == 1) i else sample(i, size = 1, ...)
}
set.seed(42)
tmp <- table_obs[, .(flight = f(flight, prob = weight)), by = city]
table_simul[, flight := tmp[table_simul, on = .(city)]$flight]
table_simul
# date_f city flight
#1: 2020-01-01 Paris 1
#2: 2020-01-02 Paris 1
#3: 2020-01-03 London 4
#4: 2020-01-03 Berlin 14
CodePudding user response:
Would something like this help?
simul_df <- data.frame()
sapply(c(1:1000), FUN = function(datos) {
date_f = as.Date(sample(seq(as.Date('2021/01/01'), as.Date('2022/12/31'), by="day"), 1))
city = sample(table_obs$city, size = 1)
flight = sample(table_obs$flight, size = 1)
weight = sample(table_obs$weight, size = 1)
simul_df <<- as.data.frame(rbind(simul_df, cbind(date_f, city, flight, weight)))
})
At the end you get something like this:
date_f city flight weight
1 19342 London 4 0.5
2 19081 Paris 7 0.5
3 18916 London 4 0.5
4 19338 Paris 5 0.5
5 19313 Paris 4 0.33
6 18731 Paris 1 0.33
7 18877 London 5 0.33
8 18864 Paris 7 0.5
9 19126 Paris 5 0.33
10 19086 Paris 2 0.33
You will get a data frame that you can convert to data.table afterwards. Best