### 6.9 Multivariate Probit

Mathematical representation

 $\begin{array}{cc}\begin{array}{rl}& \begin{array}{ccccc}\hfill {y}_{1i}^{\ast }\hfill & \hfill =\hfill & \hfill {\mathbf{x}}_{1i}^{\prime }{\beta }_{1}\hfill & \hfill +\hfill & \hfill {𝜀}_{1i}\hfill \\ \hfill {y}_{2i}^{\ast }\hfill & \hfill =\hfill & \hfill {\mathbf{x}}_{2i}^{\prime }{\beta }_{2}\hfill & \hfill +\hfill & \hfill {𝜀}_{2i}\hfill \\ \hfill ⋮\hfill & \hfill \hfill & \hfill ⋮\hfill & \hfill \hfill & \hfill ⋮\hfill \\ \hfill {y}_{Mi}^{\ast }\hfill & \hfill =\hfill & \hfill {\mathbf{x}}_{Mi}^{\prime }{\beta }_{M}\hfill & \hfill +\hfill & \hfill {𝜀}_{Mi}\hfill \\ \hfill \hfill \end{array}\\ & {y}_{mi}=\left\{\begin{array}{ccc}1\hfill & \hfill \mathrm{if}\hfill & \hfill {y}_{mi}^{\ast }>0\\ 0\hfill & \hfill \mathrm{if}\hfill & \hfill {y}_{mi}^{\ast }\le 0\end{array}\right\\phantom{\rule{2em}{0ex}}\forall m=1,2.\dots ,M\end{array}& \end{array}$ (6.17)

where the ${y}_{mi}^{\ast }$’s are not observed, but, similarly to binary Probit and Logit models, their signs are determined by the corresponding observed ${y}_{mi}$s.

• the model is estimated using $N$ observations
• ${y}_{mi}$ is the value of equation $m$’s dependent variable for observation $i$ and it can take two values: 0 and 1, $\forall m=1,2,\dots ,M$
• ${\mathbf{x}}_{mi}$ is a ${K}_{m}\phantom{\rule{0.3em}{0ex}}×\phantom{\rule{0.3em}{0ex}}1$ vector that stores the values of the ${K}_{m}$ independent variables for observation $i$, as they appear in equation $m$
• the same independent variable can appear in multiple equations, associated with diﬀerent coeﬃcients
• ${\beta }_{m}$ is a ${K}_{m}\phantom{\rule{0.3em}{0ex}}×\phantom{\rule{0.3em}{0ex}}1$ vector of parameters associated with equation $m$’s independent variables
• in total, there are $K\phantom{\rule{0.3em}{0ex}}=\phantom{\rule{0.3em}{0ex}}\sum _{m=1}^{M}{K}_{m}$ $\beta$ parameters to be estimated
• the $M$ error terms jointly follow a multivariate Normal distribution with mean $\mathbf{0}$ and covariance matrix $\Sigma$

A more compact representation of the model is:

${\mathbf{y}}_{i}^{\ast }={\mathbf{X}}_{i}\beta +{𝜀}_{i},\phantom{\rule{2em}{0ex}}\phantom{\rule{1em}{0ex}}{𝜀}_{i}\sim \mathrm{N}\left(\mathbf{0},\Sigma \right)$

where:

$\underset{M\phantom{\rule{0.3em}{0ex}}×\phantom{\rule{0.3em}{0ex}}1}{{\mathbf{y}}_{i}^{\ast }}=\left[\begin{array}{c}\hfill {y}_{1i}\hfill \\ \hfill {y}_{2i}\hfill \\ \hfill ⋮\hfill \\ \hfill {y}_{Mi}\hfill \end{array}\right],\phantom{\rule{2em}{0ex}}\underset{M\phantom{\rule{0.3em}{0ex}}×\phantom{\rule{0.3em}{0ex}}K}{{\mathbf{X}}_{i}}=\left[\begin{array}{cccc}\hfill {\mathbf{x}}_{1i}^{\prime }\hfill & \hfill \mathbf{0}\hfill & \hfill \dots \hfill & \hfill \mathbf{0}\hfill \\ \hfill \mathbf{0}\hfill & \hfill {\mathbf{x}}_{2i}^{\prime }\hfill & \hfill \dots \hfill & \hfill \mathbf{0}\hfill \\ \hfill ⋮\hfill & \hfill ⋮\hfill & \hfill \ddots \hfill & \hfill ⋮\hfill \\ \hfill \mathbf{0}\hfill & \hfill \mathbf{0}\hfill & \hfill \dots \hfill & \hfill {\mathbf{x}}_{Mi}^{\prime }\hfill \end{array}\right],\phantom{\rule{2em}{0ex}}\underset{K\phantom{\rule{0.3em}{0ex}}×\phantom{\rule{0.3em}{0ex}}1}{\beta }=\left[\begin{array}{c}\hfill {\beta }_{1}\hfill \\ \hfill {\beta }_{2}\hfill \\ \hfill ⋮\hfill \\ \hfill {\beta }_{M}\hfill \end{array}\right],\phantom{\rule{2em}{0ex}}\underset{M\phantom{\rule{0.3em}{0ex}}×\phantom{\rule{0.3em}{0ex}}1}{{𝜀}_{i}}=\left[\begin{array}{c}\hfill {𝜀}_{1i}\hfill \\ \hfill {𝜀}_{2i}\hfill \\ \hfill ⋮\hfill \\ \hfill {𝜀}_{Mi}\hfill \end{array}\right]\phantom{\rule{2.22198pt}{0ex}}\phantom{\rule{2.22198pt}{0ex}}$

and $\Sigma$ being the covariance matrix of ${𝜀}_{i}$: $\Sigma \equiv {\Omega }^{-1}$. For identiﬁcation purposes, $\Sigma$ is normalized such that its diagonal elements are equal to one. Thus, $\Sigma$ is, in fact, a correlation matrix.

Priors

 Parameter Probability density function Default hyperparameters $\beta$ $p\left(\beta \right)=\frac{|\mathbf{P}{|}^{1∕2}}{{\left(2\pi \right)}^{K∕2}}exp\left\{-\frac{1}{2}{\left(\beta -\mathbf{m}\right)}^{\prime }\mathbf{P}\left(\beta -\mathbf{m}\right)\right\}$ $\mathbf{m}\phantom{\rule{0.3em}{0ex}}=\phantom{\rule{0.3em}{0ex}}{\mathbf{0}}_{K}$, $\mathbf{P}\phantom{\rule{0.3em}{0ex}}=\phantom{\rule{0.3em}{0ex}}0.001\cdot {\mathbf{I}}_{K}$ $\Sigma$ $p\left(\stackrel{̆}{\Sigma }\right)=\frac{|\stackrel{̆}{\Sigma }{|}^{-\frac{n+M+1}{2}}|\mathbf{S}{|}^{n∕2}}{{2}^{nM∕2}{\Gamma }_{M}\left(\frac{n}{2}\right)}exp\left\{-\frac{1}{2}tr\left(\mathbf{S}{\stackrel{̆}{\Sigma }}^{-1}\right)\right\}$ $n={M}^{2}$, $\mathbf{V}=\frac{100}{M}\cdot {\mathbf{I}}_{M}$ The prior for $\stackrel{̆}{\Sigma }$ is transformed such that $diag\left(\Sigma \right)={\mathbf{1}}_{M}$ and $\Sigma$ is a proper correlation matrix.

 An inverse-Wishart prior is used for a positive-deﬁnite, but otherwise unrestricted, matrix, $\stackrel{̆}{\Sigma }$. This prior is then internally transformed to a prior for $\Sigma =\mathbf{D}\stackrel{̆}{\Sigma }\mathbf{D}$, where $\mathbf{D}$ is an $M\phantom{\rule{0.3em}{0ex}}×\phantom{\rule{0.3em}{0ex}}M$ diagonal matrix constructed by taking the inverse of the square root of the diagonal elements of $\stackrel{̆}{\Sigma }$: $\mathbf{D}=diag\left(\left[\begin{array}{cccc}\hfill {\stackrel{̆}{\sigma }}_{11}^{-1∕2}\hfill & \hfill {\stackrel{̆}{\sigma }}_{22}^{-1∕2}\hfill & \hfill \cdots \hfill & \hfill {\stackrel{̆}{\sigma }}_{MM}^{-1∕2}\hfill \end{array}\right]\right)$.

 The magnitude of the elements of the scale matrix, $\mathbf{S}$, in the prior for $\stackrel{̆}{\Sigma }$ aﬀects the prior for $\Sigma$ only if $\mathbf{S}$ is not diagonal. The degrees-of-freedom parameter, $n$, in the prior for $\stackrel{̆}{\Sigma }$ can be used to control the dispersion of the implied prior density of $\Sigma$ around the prior expected value: $E\left(\Sigma \right)={\mathbf{I}}_{M}$ if $\mathbf{S}$ is diagonal $E\left(\Sigma \right)=\mathbf{D}\mathbf{S}\mathbf{D}$ if $\mathbf{S}$ is not diagonal In both cases smaller values of $n$ allow larger deviations form $E\left(\Sigma \right)$.

 Due to an inverse-Wishart distribution being used as the prior for $\stackrel{̆}{\Sigma }$, the prior expected value of $\stackrel{̆}{\Sigma }$ is $\frac{1}{n-M-1}\mathbf{S}$. Although the prior for $\stackrel{̆}{\Sigma }$ is transformed to a prior for $\Sigma$ such that the latter is a correlation matrix, due to simulation-based integration taking place in this transformation when $\mathbf{S}$ is not diagonal, hyperparameter values that result into an expected value of $\stackrel{̆}{\Sigma }$ that is far from satisfying the conditions necessary for it to be a correlation matrix may lead to numerical unstability issues when calculating the log-marginal likelihood of the model. It is, therefore, advised that $n$ is used to control the dispersion of $\Sigma$ around its expected value and $\mathbf{S}$ is subsequently deﬁned such that $\frac{1}{n-M-1}\mathbf{S}$ is close to being a proper correlation matrix (positive deﬁnite with values equal to one on the main diagonal).

Syntax

$\left[$<model name> = $\right]$ mvprobit( {
y1 ~ x11 x12  x1K${}_{1}$,
y2 ~ x21 x22  x2K${}_{2}$,
,
yM ~ xM1 xM2  xMK${}_{M}$ }
$\left[$, <options> $\right]$
);

where:

• y1, y2, …, yM are the dependent variable names, as they appear in the dataset used for estimation
• xm1 xm2 $\dots$xmK${}_{m}$ is a list of the ${K}_{m}$ independent variable names for equation $m=1,2,\dots ,M$, as they appear in the dataset used for estimation; when a constant term is to be included in an equation, this must be requested explicitly; $M$ such lists must be provided

 All dependent variables, y1, y2, …, yM, in the dataset used for estimation must contain only two values: 0 and 1 (with 1 indicating “success" for the respective outcome). Observations with missing values in any dependent variable are dropped during estimation, but if a numerical value other than 0 and 1 is encountered, then an error is produced.

The optional arguments for the multivarite Probit model are:9

 Gibbs parameters "chains" number of chains to run in parallel (positive integer); the default value is 1 "burnin" number of burn-in draws per chain (positive integer); the default value is 10000 "draws" number of retained draws per chain (positive integer); the default value is 20000 "thin" value of the thinning parameter (positive integer); the default value is 1 "seed" value of the seed for the random-number generator (positive integer); the default value is 42 Hyperparameters "m" mean vector of the prior for $\beta$ ($K\phantom{\rule{0.3em}{0ex}}×\phantom{\rule{0.3em}{0ex}}1$ vector); the default value is ${\mathbf{0}}_{K}$ "P" precision matrix of the prior for $\beta$ ($K\phantom{\rule{0.3em}{0ex}}×\phantom{\rule{0.3em}{0ex}}K$ symmetric and positive-deﬁnite matrix); the default value is $0.001\phantom{\rule{0.3em}{0ex}}\cdot \phantom{\rule{0.3em}{0ex}}{\mathbf{I}}_{K}$ "S" scale matrix of the prior for $\Sigma$ ($M\phantom{\rule{0.3em}{0ex}}×\phantom{\rule{0.3em}{0ex}}M$ symmetric and positive-deﬁnite matrix); the default value is $\frac{100}{M}\phantom{\rule{0.3em}{0ex}}\cdot \phantom{\rule{0.3em}{0ex}}{\mathbf{I}}_{M}$ "n" degrees-of-freedom parameter of the prior for $\Sigma$ (real number greater than or equal to $M$); the default value is ${M}^{2}$ Dataset and log-marginal likelihood "dataset" the id value of the dataset that will be used for estimation; the default value is the ﬁrst dataset in memory (in alphabetical order) "logML_CJ" boolean indicating whether the Chib (1995)/Chib & Jeliazkov (2001) approximation to the log-marginal likelihood should be calculated (true$|$false); the default value is false

Reported Parameters

 $\beta$ variable_name vector of parameters associated with the independent variables; these are broken into groups according to the equation in which the independent variables appear

Stored values and post-estimation analysis
If a left-hand-side id value is provided when a multivariate Probit model is created, then the following results are saved in the model item and are accessible via the ‘.’ operator:

 Samples a matrix containing the draws from the posterior of $\beta$ (across all equations, starting from the ﬁrst equation) and the unique elements of $\Sigma$ ym$xm1,$\dots$, ym$xmK${}_{m}$ vectors containing the draws from the posterior of the parameters associated with variables xm1,$\dots$,xmK${}_{m}$, for $m=1,2,\dots ,M$ (the names of these vectors are the names of the variables that were included in the right-hand side of equation $m$, prepended by ym$, where ym is the name of the dependent variable in equation $m$; this is done so that the samples on the parameters associated with a variable that appears in more than one equations can be distinguished) Sigma_i_j vectors containing the draws from the posterior of the unique elements of $\Sigma$; because $\Sigma$ is symmetric, only $\frac{\left(M-1\right)M}{2}\phantom{\rule{0.3em}{0ex}}+\phantom{\rule{0.3em}{0ex}}M$ of its elements are stored (instead of all ${M}^{2}$ elements), including the elements that are restricted to be equal to one; i and j index the row and column of $\Sigma$, respectively, at which the corresponding element is located Sigma $M\phantom{\rule{0.3em}{0ex}}×\phantom{\rule{0.3em}{0ex}}M$ matrix that stores the posterior mean of $\Sigma$ logML the Lewis & Raftery (1997) approximation of the log-marginal likelihood logML_CJ the Chib (1995)/Chib & Jeliazkov (2001) approximation to the log-marginal likelihood; this is available only if the model was estimated with the "logML_CJ"=true option nchains the number of chains that were used to estimate the model nburnin the number of burn-in draws per chain that were used when estimating the model ndraws the total number of retained draws from the posterior ($=$chains $\cdot$ draws) nthin value of the thinning parameter that was used when estimating the model nseed value of the seed for the random-number generator that was used when estimating the model Additionally, the following functions are available for post-estimation analysis (see section B.14): • diagnostics() • test() • pmp() • mfx() • predict() The multivariate Probit model uses the mfx() function to calculate and report the marginal eﬀects of the independent variables on the probability of success for each of $M$ dependent variables. There are two types of marginal eﬀects which can be requested by setting the "type" argument of the mfx() function equal to 1 or 2: 1. when "type"=1 the marginal eﬀects for each of the $M$ outcomes are calculated marginally with respect to the values of the remaining dependent variables 2. when "type"=2 the marginal eﬀects are calculated conditionally on the values of other dependent variables being equal to the values indicated by a vector z, passed to the mfx() function using the "opt" option. This vector must have dimension equal to $M$ and values equal to either 0 or 1. A value of 0 in the $m$-th position of z indicates that the $m$-th dependent variable is to restricted to 0 when calculating marginal eﬀects on the remaining variables; a value of 1 indicates that the $m$-th dependent variable is to be restricted to 1 when calculating these marginal eﬀects. The generic syntax for a statement involving the mfx() function after estimation of a multivariate Probit model is: mfx( $\left[$"type"=1$\right]$ $\left[$, "point"=<point of calculation>$\right]$ $\left[$, "model"=<model name>$\right]$ ); and: mfx( "type"=2, "opt"=z $\left[$, "point"=<point of calculation>$\right]$ $\left[$, "model"=<model name>$\right]$ ); for calculating these two types of marginal eﬀects. The default value of the "type" option is 1. See the general documentation of the mfx() function (section B.14) for details on the other optional arguments.  Although BayES can calculate marginal eﬀects for the multivariate Probit model at each observation, the calculations may take an excessive amount of time to complete. This is because the GHK simulator needs to be invoked at every observed data point, each time using all draws from the posterior, thus leading to an immense number of computations. The multivariate Probit model uses the predict() function to generate predictions of the probability of success for each of $M$ dependent variables. There are two types of predictions which can be requested by setting the "type" argument of the predict() function equal to 1 or 2: 1. when "type"=1 the predictions are generated for each of the $M$ dependent variables are calculated marginally with respect to the values of the remaining dependent variables 2. when "type"=2 the predictions are generated conditionally on the values of other dependent variables being equal to the values indicated by a vector z, passed to the predict() function using the "opt" option. This vector must have dimension equal to $M$ and values equal to either 0 or 1. A value of 0 in the $m$-th position of z indicates that the $m$-th dependent variable is to restricted to 0 when generating predictions for the remaining variables; a value of 1 indicates that the $m$-th dependent variable is to be restricted to 1 when generating these predictions. The generic syntax for a statement involving the predict() function after estimation of a random-eﬀects binary Probit model is: $\left[$<id value>$\right]$ = predict( $\left[$"type"=1$\right]$ $\left[$, "point"=<point of calculation>$\right]$ $\left[$,"model"=<model name>$\right]$ $\left[$, "stats"=true|false$\right]$ $\left[$, "prefix"=<prefix for new variable name>$\right]$ ); and: $\left[$<id value>$\right]$ = predict( "type"=2, "opt"=z $\left[$, "point"=<point of calculation>$\right]$ $\left[$,"model"=<model name>$\right]$ $\left[$, "stats"=true|false$\right]$ $\left[$, "prefix"=<prefix for new variable name>$\right]$ ); for generating these two types of predictions eﬀects. The default value of the "type" option is 1. See the general documentation of the predict() function (section B.14) for details on the other optional arguments.  Although BayES can generate summary statistics of the predictions for the multivariate Probit model at each observation, the calculations may take an excessive amount of time to complete. This is because the GHK simulator needs to be invoked at every observed data point, each time using all draws from the posterior, thus leading to an immense number of computations. Examples Example 1 myData = import("$BayESHOME/Datasets/dataset11.csv");
myData.constant = ones(rows(myData), 1);

myModel = mvprobit( {
y1 ~ constant x11 x12 x13 x14 x15,
y2 ~ constant x21 x22 x23 x24,
y3 ~ constant x21 x22 x23
} );

Example 2

myData = import("\$BayESHOME/Datasets/dataset11.csv");
myData.constant = ones(rows(myData), 1);

myModel = mvprobit( {
y1 ~ constant x11 x12 x13 x14 x15,
y2 ~ constant x21 x22 x23 x24,
y3 ~ constant x21 x22 x23
},
"m"=ones(15,1), "P"=0.01*eye(15,15), "n"=5,
"burnin"=20000, "draws"=40000, "thin"=4, "chains"=2,
"logML_CJ" = true
);

mfx( "point"="mean", "model"=myModel );
x_for_mfx = [
1.0,1.0,1.0,1.0,1.0,0.0, // values for the variables in the 1st equation
1.0,0.0,0.5,0.5,0.5,     // values for the variables in the 2nd equation
1.0,0.0,0.5,0.5          // values for the variables in the 3rd equation
];
mfx( "point"=x_for_mfx, "model"=myModel );
mfx( "point"="mean", "model"=myModel, "type"=2, "opt"=[1,0,1]);

predict("prefix"=marg_);
predict("prefix"=cond_, "type"=2, "opt"=[1,0,1]);

9Optional arguments are always given in option-value pairs (eg. "chains"=3).