I made a certain Bayesian model, including the typical components (data, model, parameters, likelihood).
This model is a linear regression:
library(ggplot2)
#library (ggedit)
library(plyr)
library(StanHeaders)
library(rstan)
# Equation (1)
for(i in 1:N){
alphaC_P[i] ~ normal ((alphaC_A[Date[i]]) * (1- (F_T[Date[i]]))
alphaC_T[i] * (F_T[Date[i]]), sigma_C);
}
Due to memory needs, I am running this analysis on a cluster.
I prepare the list of elements (e.g., #Equation (2): mylist <- list()
)
Finally, I run the Bayesian analysis on the cluster.
Equation (3)
rstan::stan(file=args[2], data= mylist, cores=12, warmup= 48000,
iter= 50000, chains= 4, seed = 14)
# file=args[2] = Bayesian model
Since my data has NAs, my question is:
Where should I include the instruction to omit/ignore/exclude the NAs?
e.g., should it be in Equation #1, #2 or #3?
Finally, what should I do: omit, ignore, exclude them?
Thanks in advance
CodePudding user response:
Your code example is not very clear. For example, in your first code snippet, you're mixing R with Stan syntax.
From a Stan perspective it's very simple: Stan does not accept NA
s in data. You can do two things:
- Either change
NA
s to some arbitrary number and then have your Stan model check for the presence of this magic number in your data and deal with it in a suitable way (i.e. remove, ignore, replace, impute), - Or deal with
NA
s at the R level (i.e. remove, replace, impute) before sending data to the Stan model.
As to how to deal with NA
s, that really depends on your data and data collection process, which only you know the details of (in other words, this needs domain-specific knowledge).
Lastly, a lot of operations in Stan are vectorised. So instead of writing e.g.
for (i in 1:N)
y[i] ~ normal(mu[i], sigma)
you can (and should) write
y ~ normal(mu, sigma)