3.4 Interface to Stan
Stan is another open-source program similar to JAGS and OpenBUGS, but with two important differences (i) the language it uses is slightly more complex, but also more flexible, and (ii) it uses specialized sampling algorithms designed to work efficiently on hierarchical models. No matter these differences, Stan also takes as inputs data and a model specification file (written in Stan’s own language) and draws samples from the posterior distribution of the model’s parameters or latent variables, or maximizes the respective likelihood function.
BayES’ stan() function provides a convenient interface to Stan, which allows the user to:
- pass BayES matrices as input data to Stan
- request Stan to draw samples from the posterior distribution of the model’s parameters and latent data or maximize the likelihood function, given a model specification file
- retrieve the draws or other results from Stan, summarize them and print summary statistics on the BayES console
- store the draws or maximum-likelihood estimates from Stan in a BayES model item, making them available for post estimation analysis
The general syntax of the stan() function is the following:4
, "method"="sample"|"variational"|"optimize"
, "data"=<list of matrices to pass to Stan>
, "inits"=<structure of initial values>
, "output"=<file where Stan should store its results>
, "diagnostic"=<file where Stan should store results for diagnostics>
, "options"=<additional command-line options>
, "summarize"=true|false
, "diagnose"=true|false
, "chains"=<positive integer>
, "burnin"=<positive integer>
, "draws"=<positive integer>
, "thin"=<positive integer>
, "seed"=<positive integer>
, "refresh"=<positive integer>
);
where:
- <model name> is a BayES id value which will be associated with the model resulting from executing the stan() function. If no model name is provided the results from Stan will still be returned to BayES and summarized, but they will not be stored for further analysis. jags(), openbugs() and stan() are the three interface functions that provide the highest level of integration with BayES: the results from these functions are stored in BayES model items, on which all BayES functions which operate on models can be used.
- <model specification file> is a string pointing to the file which contains the specification of the Stan model. If the specification file is not in the current working directory then the file name must be prepended by the path to the file, either in absolute terms (eg. "C:/MyFiles/myModel.stan") or relative to the current directory (eg. "./myModel.stan"). This is the only mandatory argument of the stan() function.
- "method" specifies the Stan method to be used. This can be one of the strings "sample", "variational" or "optimize", each one of them invoking the respective Stan method. Note that Stan’s “diagnose" method cannot be accessed in BayES directly, but diagnostic tests can be performed within Stan by setting the "diagnose" option to true in the stan() function. The default value of the "method" argument is "sample", in which case Stan samples from the posterior distribution of the model’s parameters using a Hamiltonian Monte Carlo (HMC) algorithm of fixed-parameter sampling (depending on other options).
- "data" specifies the data matrices that will be passed as input to Stan. <list of matrices> is a list of the id values of matrices (comma-separated names inside curly brackets), as they appear in the Stan model specification file. These matrices must be defined in the current workspace.
- "inits" specifies the initial values per chain used by Stan. <structure of initial values> is a BayES structure, the elements of which could be structures themselves. Each element of the chain-specific structure corresponds to a parameter or latent variable, using the same id values as the ones used in the Stan model specification file. It is possible to provide initial values for all parameters/latent variables or only a subset of them. It is also possible to leave entire chains uninitialized. In such cases Stan will generate initial values for the chains/parameters/latent variables which are not initialized by the user, using its default options.
- "output" specifies the file to which Stan should store its results. If the output file is not in the current directory then the file name must be prepended by the path to the file, either in absolute terms (eg. "C:/MyFiles/myResults.csv") or relative to the current working directory (eg. "./myResults.csv"). If the output file is not specified then BayES will create temporary files in the current working directory. If, however, the user provides a name for the output file(s), these will persist even after exiting BayES.
- "diagnostic" specifies the file to which Stan should store results that can be used for post-estimation diagnostics. If the diagnostic file is not in the current directory then the file name must be prepended by the path to the file, either in absolute terms (eg. "C:/MyFiles/myDgnstcs.csv") or relative to the current directory (eg. "./myDgnstcs.csv"). If the user provides a name for the diagnostic file(s), these will persist even after exiting BayES.
- "summarize" indicates whether the results produced by Stan (draws or values at which the likelihood function is maximized) should be summarized withing Stan, before returning control to BayES. The default value of "summarize" is true.
- "diagnose" indicates whether Stan should run diagnostic tests on the results it produced (draws or values at which the likelihood function is maximized), before returning control to BayES. The default value of "summarize" is true. Note that this optional argument effectively replaces Stan’s “diagnose" method.
- "options" can be used to pass additional options to Stan, using its extensive argument tree. These options should be provided to BayES’ stan() function as a string, which is then passed verbatim to Stan. For example, setting the right-hand side of the "options" argument to "algorihtm=fixed_param" requests Stan to use the fixed-parameter sampler under its “sample" method.
- "chains" specifies the number of chains that Stan will run in parallel. If the "method" argument of the stan() function is set to "sample" (default), BayES spawns as many Stan processes as the number of chains, which run in parallel and also mutes Stan’s output on the console. If, however, the "method" argument is set to either "variational" or "optimize", the "chains" argument is ignored.
- "burnin" performs different functions under different Stan methods. If the "method" argument of the stan() function is set to "sample" (default), "burnin" specifies the number of draws from the posterior that will be discarded (per chain) to avoid dependence of the results on initial values. If the "method" argument is set to "variational", "burnin" specifies the maximum number of ADVI iterations. This argument is ignored when the "method" argument is set to "optimize". The right-hand side must be a positive integer and the default value is 10,000.
- "draws" performs different functions under different Stan methods. If the "method" argument of the stan() function is set to "sample" (default), or "variational", "draws" specifies the number of draws from the posterior that will be retained, per chain. If the "method" argument is set to "optimize", "draws" specifies the maximum number of iterations of the algorithm that is used to maximize the likelihood. The right-hand side must be a positive integer and the default value is 20,000.
- "thin" specifies, when the "method" argument of the stan() function is set to "sample" (default), the number of draws from the posterior that will be skipped (after the burn-in phase) per retained draw, to avoid high autocorrelation of the retained draws. For example, if the thinning parameter is set to 3, then only one in three consecutive draws will be retained and become available for inference and post-estimation analysis. The "thin" argument is ignored when the "method" argument is set to "variational" or "optimize". The right-hand side must be a positive integer and the default value is 1.
- "seed" specifies the seed for the random-number generator used by Stan. The right-hand side must be a positive integer and the default value is 42.
- "refresh" specifies the rate at which Stan prints information about its progress on the console. For example, if "refresh" is set equal to 100, Stan will print information every 100 iterations of the respective algorithm. The right-hand side must be a positive integer and the default value is 1000.
The path to the Stan model specification file must not contain any spaces. |
As the stan() function executes, Stan attempts to print output on the system’s command console. BayES grabs this output and redirects it to the BayES main console in real time. This output is entirely determined by Stan and it includes information on the model specification file used in the current run, all the options used, any errors or warnings and, most importantly, information on the progress of the algorithm being used. Note that when multiple chains are run in parallel (under Stan’s “sample" method), BayES mutes Stan’s output on the console.
Many of the sample script files in "$BayESHOME/Samples/3JAGSOpenBUGSStan" contain examples of using the stan() function, along with Stan model specification files for simple models. The Stan interface is also accessible from the BayES main menu via Interfaces → Stan.
4Arguments inside square brackets are optional. Optional arguments passed to the stan() function can be provided in any order, but always after the mandatory argument (model specification file). Optional arguments always come in pairs (eg. "chains"=1).