### 6.7 Conditional Probit

Mathematical representation

 (6.13)
• the model is estimated using $N$ observations
• ${y}_{i}$ is the value of the dependent variable for observation $i$ and it can take $M+1$ values: 0, 1, …, M
• ${\mathbf{z}}_{mi}$ is a $K\phantom{\rule{0.3em}{0ex}}×\phantom{\rule{0.3em}{0ex}}1$ vector that stores the values of the $K$ independent variables for observation $i$, which are speciﬁc to alternative $m$
• ${\mathbf{x}}_{i}$ is an $L\phantom{\rule{0.3em}{0ex}}×\phantom{\rule{0.3em}{0ex}}1$ vector that stores the values of the $L$ independent variables for observation $i$, which are common to all alternatives
• $\delta$ is a $K\phantom{\rule{0.3em}{0ex}}×\phantom{\rule{0.3em}{0ex}}1$ vector of parameters, common to all alternatives
• ${\beta }_{m}$ is an $L\phantom{\rule{0.3em}{0ex}}×\phantom{\rule{0.3em}{0ex}}1$ vector of parameters for alternative $m$, $m=1,2,\dots ,M$
• there are $J=K+M\cdot L$ slope parameters to be estimated

The latent-variable representation of the conditional Probit is:

 $\begin{array}{ccccc}\hfill {y}_{1i}^{\ast }\hfill & \hfill \phantom{\rule{-8.00003pt}{0ex}}=\hfill & \hfill \phantom{\rule{-8.00003pt}{0ex}}{\mathbf{z}}_{1i}^{\prime }\delta +{\mathbf{x}}_{i}^{\prime }{\beta }_{1}\hfill & \hfill \phantom{\rule{-8.00003pt}{0ex}}+\hfill & \hfill \phantom{\rule{-8.00003pt}{0ex}}{𝜀}_{1i}\hfill \\ \hfill {y}_{2i}^{\ast }\hfill & \hfill \phantom{\rule{-8.00003pt}{0ex}}=\hfill & \hfill \phantom{\rule{-8.00003pt}{0ex}}{\mathbf{z}}_{2i}^{\prime }\delta +{\mathbf{x}}_{i}^{\prime }{\beta }_{2}\hfill & \hfill \phantom{\rule{-8.00003pt}{0ex}}+\hfill & \hfill \phantom{\rule{-8.00003pt}{0ex}}{𝜀}_{2i}\hfill \\ \hfill ⋮\hfill & \hfill \phantom{\rule{-16.00008pt}{0ex}}\hfill & \hfill ⋮\hfill & \hfill \hfill & \hfill ⋮\hfill \\ \hfill {y}_{Mi}^{\ast }\hfill & \hfill \phantom{\rule{-8.00003pt}{0ex}}=\hfill & \hfill \phantom{\rule{-8.00003pt}{0ex}}{\mathbf{z}}_{Mi}^{\prime }\delta +{\mathbf{x}}_{i}^{\prime }{\beta }_{M}\hfill & \hfill \phantom{\rule{-8.00003pt}{0ex}}+\hfill & \hfill \phantom{\rule{-8.00003pt}{0ex}}{𝜀}_{Mi}\hfill \\ \hfill \hfill \end{array}\phantom{\rule{2em}{0ex}}{y}_{i}=\left\{\begin{array}{ccc}\hfill 0\hfill & \hfill \phantom{\rule{-5.0pt}{0ex}}\text{if}\hfill & \underset{j}{max}\left\{{y}_{ji}^{\ast }\right\}\le 0\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}\hfill \\ \hfill 1\hfill & \hfill \phantom{\rule{-5.0pt}{0ex}}\text{if}\hfill & \underset{j}{max}\left\{{y}_{ji}^{\ast }\right\}={y}_{1i}^{\ast }\hfill & \hfill \text{and}\hfill & \hfill {y}_{1i}^{\ast }>0\hfill \\ \hfill ⋮\hfill & \hfill \phantom{\rule{-5.0pt}{0ex}}⋮\hfill & \phantom{\rule{35.00005pt}{0ex}}⋮\hfill & \hfill ⋮\hfill & \hfill ⋮\hfill \\ \hfill M\hfill & \hfill \phantom{\rule{-5.0pt}{0ex}}\text{if}\hfill & \underset{j}{max}\left\{{y}_{ji}^{\ast }\right\}={y}_{Mi}^{\ast }\hfill & \hfill \text{and}\hfill & \hfill {y}_{Mi}^{\ast }>0\hfill \end{array}\right\$ (6.14)

Let:

$\underset{J\phantom{\rule{0.3em}{0ex}}×\phantom{\rule{0.3em}{0ex}}1}{\beta }=\left[\begin{array}{c}\hfill \delta \hfill \\ \hfill {\beta }_{1}\hfill \\ \hfill {\beta }_{2}\hfill \\ \hfill ⋮\hfill \\ \hfill {\beta }_{M}\hfill \end{array}\right]\phantom{\rule{2em}{0ex}}\text{and}\phantom{\rule{2em}{0ex}}\underset{M\phantom{\rule{0.3em}{0ex}}×\phantom{\rule{0.3em}{0ex}}1}{{𝜀}_{i}}=\left[\begin{array}{c}\hfill {𝜀}_{1i}\hfill \\ \hfill {𝜀}_{2i}\hfill \\ \hfill ⋮\hfill \\ \hfill {𝜀}_{Mi}\hfill \end{array}\right]\phantom{\rule{2.22198pt}{0ex}}\phantom{\rule{2.22198pt}{0ex}}$

${𝜀}_{i}$ is assumed to follow a multivariate-Normal distribution: $𝜀\sim \mathrm{N}\left(\mathbf{0},{\Omega }^{-1}\right)$. For identiﬁcation purposes the precision matrix is restricted such that $tr\left(\Omega \right)=M$.

Priors

 Parameter Probability density function Default hyperparameters $\beta$ $p\left(\beta \right)=\frac{|\mathbf{P}{|}^{1∕2}}{{\left(2\pi \right)}^{J∕2}}exp\left\{-\frac{1}{2}{\left(\beta -\mathbf{m}\right)}^{\prime }\mathbf{P}\left(\beta -\mathbf{m}\right)\right\}$ $\mathbf{m}\phantom{\rule{0.3em}{0ex}}=\phantom{\rule{0.3em}{0ex}}{\mathbf{0}}_{J}$, $\mathbf{P}\phantom{\rule{0.3em}{0ex}}=\phantom{\rule{0.3em}{0ex}}0.001\cdot {\mathbf{I}}_{J}$ $\Omega$ $p\left(\Omega \right)=\frac{|\Omega {|}^{\frac{n-M-1}{2}}|{\mathbf{V}}^{-1}{|}^{n∕2}}{{2}^{nM∕2}{\Gamma }_{M}\left(\frac{n}{2}\right)}exp\left\{-\frac{1}{2}tr\left({\mathbf{V}}^{-1}\Omega \right)\right\}$ $n={M}^{2}$, $\mathbf{V}=\frac{100}{M}\cdot {\mathbf{I}}_{M}$ Following Burgette & Nordheim (2012), the prior for $\Omega$ is transformed such that $tr\left(\Omega \right)=M$

Syntax

$\left[$<model name> = $\right]$ cprobit( y ~ z1 z2  zK $\left[$$|$ x1 x2 $\dots$ xL$\right]$ $\left[$, <options> $\right]$ );

where:

• y is the dependent variable name, as it appears in the dataset used for estimation
• z1 z2 $\dots$zK is a list of the names, as they appear in the dataset used for estimation except for the alternative index, of the independent variables which are associated with variables that vary by alternative; for each ‘zk’ variable, the dataset must contain $M\phantom{\rule{0.3em}{0ex}}+\phantom{\rule{0.3em}{0ex}}1$ variables whose names start by zk and followed by an underscore and the index of the alternative to which the variable corresponds (counting starting at zero)
• x1 x2 $\dots$xL is a list of the names, as they appear in the dataset used for estimation, of the independent variables which are common to all alternatives; when a constant term is to be included in the model, this must be requested explicitly

 The dependent variable, y, in the dataset used for estimation must contain only consecutive integer values, with the numbering starting at 0 (base category). Observations with missing values in y are dropped during estimation, but if a non-integer numerical value is encountered or if the integer values are not consecutive (for example there are no observations for which ${y}_{i}=1$), then an error is produced.

The optional arguments for the conditional Probit model are:7

 Gibbs parameters "chains" number of chains to run in parallel (positive integer); the default value is 1 "burnin" number of burn-in draws per chain (positive integer); the default value is 10000 "draws" number of retained draws per chain (positive integer); the default value is 20000 "thin" value of the thinning parameter (positive integer); the default value is 1 "seed" value of the seed for the random-number generator (positive integer); the default value is 42 Hyperparameters "m" mean vector of the prior for $\beta$ ($J\phantom{\rule{0.3em}{0ex}}×\phantom{\rule{0.3em}{0ex}}1$ vector); the default value is ${\mathbf{0}}_{J}$ "P" precision matrix of the prior for $\beta$ ($J\phantom{\rule{0.3em}{0ex}}×\phantom{\rule{0.3em}{0ex}}J$ symmetric and positive-deﬁnite matrix); the default value is $0.001\phantom{\rule{0.3em}{0ex}}\cdot \phantom{\rule{0.3em}{0ex}}{\mathbf{I}}_{J}$ "V" scale matrix of the prior for $\Omega$ ($M\phantom{\rule{0.3em}{0ex}}×\phantom{\rule{0.3em}{0ex}}M$ symmetric and positive-deﬁnite matrix); the default value is $\frac{100}{M}\phantom{\rule{0.3em}{0ex}}\cdot \phantom{\rule{0.3em}{0ex}}{\mathbf{I}}_{M}$ "n" degrees-of-freedom parameter of the prior for $\Omega$ (real number greater than or equal to $M$); the default value is ${M}^{2}$ Dataset and log-marginal likelihood "dataset" the id value of the dataset that will be used for estimation; the default value is the ﬁrst dataset in memory (in alphabetical order) "logML_CJ" boolean indicating whether the Chib (1995)/Chib & Jeliazkov (2001) approximation to the log-marginal likelihood should be calculated (true$|$false); the default value is false

Reported Parameters

 $\delta$ variable_name vector of parameters associated with the independent variables that vary by alternative $\beta$ variable_name vector of parameters associated with the independent variables which are common to all alternatives; these are broken into groups according to the alternative, $m$, the parameter is associated with

Stored values and post-estimation analysis
If a left-hand-side id value is provided when a conditional Probit model is created, then the following results are saved in the model item and are accessible via the ‘.’ operator:

 Samples a matrix containing the draws from the posterior of $\beta$ (including $\delta$ and the $\beta$s across all alternatives, starting from the ﬁrst one) and the unique elements of $\Omega$ z1,$\dots$,zK vectors containing the draws from the posterior of the parameters associated with variables that vary by alternative, z1,$\dots$,zK; (the names of these vectors are the names of the variables as they were included in the right-hand side of the model, excluding the alternative index) y_m$x1,$\dots$, y_m$xK vectors containing the draws from the posterior of the parameters associated with variables that are common to all alternatives, x1,$\dots$,xK, for $m=1,2,\dots ,M$ (the names of these vectors are the names of the variables that were included in the right-hand side of the model, prepended by y_m$, where y_m is the name of the dependent variable followed by an underscore and the index of the alternative; this is done so that the samples on the parameters associated with the same independent variable but for diﬀerent alternatives can be distinguished) Omega_i_j vectors containing the draws from the posterior of the unique elements of $\Omega$; because $\Omega$ is symmetric, only $\frac{\left(M-1\right)M}{2}\phantom{\rule{0.3em}{0ex}}+\phantom{\rule{0.3em}{0ex}}M$ of its elements are stored (instead of all ${M}^{2}$ elements); i and j index the row and column of $\Omega$, respectively, at which the corresponding element is located Omega $M\phantom{\rule{0.3em}{0ex}}×\phantom{\rule{0.3em}{0ex}}M$ matrix that stores the posterior mean of $\Omega$; $\Omega$ is restricted such that $tr\left({\Omega }^{-1}\right)=M$ logML the Lewis & Raftery (1997) approximation of the log-marginal likelihood logML_CJ the Chib (1995)/Chib & Jeliazkov (2001) approximation to the log-marginal likelihood; this is available only if the model was estimated with the "logML_CJ"=true option nchains the number of chains that were used to estimate the model nburnin the number of burn-in draws per chain that were used when estimating the model ndraws the total number of retained draws from the posterior ($=$chains $\cdot$ draws) nthin value of the thinning parameter that was used when estimating the model nseed value of the seed for the random-number generator that was used when estimating the model Additionally, the following functions are available for post-estimation analysis (see section B.14): • diagnostics() • test() • pmp() • mfx() • predict() The conditional Probit model uses the mfx() function to calculate and report the marginal eﬀects of the independent variables on the probability of each outcome, $m$, occurring. Because the model calculates only one type of marginal eﬀects, the only valid value for the "type" option is 1. The generic syntax for a statement involving the mfx() function after estimation of a conditional Probit model is: mfx( $\left[$"type"=1$\right]$ $\left[$, "point"=<point of calculation>$\right]$ $\left[$, "model"=<model name>$\right]$ ); See the general documentation of the mfx() function (section B.14) for details on the other optional arguments.  Although BayES can calculate marginal eﬀects for the conditional Probit model at each observation, the calculations may take an excessive amount of time to complete. This is because the GHK simulator needs to be invoked at every observed data point, each time using all draws from the posterior, thus leading to an immense number of computations. The conditional Probit model uses the predict() function to generate predictions of the probability each of the $M\phantom{\rule{0.3em}{0ex}}+\phantom{\rule{0.3em}{0ex}}1$ outcomes occuring. Because the model generates only one type of predictions, the only valid value for the "type" option is 1. The generic syntax for a statement involving the predict() function after estimation of a conditional Probit model is: $\left[$<id value>$\right]$ = predict( $\left[$"type"=1$\right]$ $\left[$, "point"=<point of calculation>$\right]$ $\left[$,"model"=<model name>$\right]$ $\left[$, "stats"=true|false$\right]$ $\left[$, "prefix"=<prefix for new variablename>$\right]$ ); See the general documentation of the predict() function (section B.14) for details on the other optional arguments.  Although BayES can generate summary statistics of the predictions for the conditional Probit model at each observation, the calculations may take an excessive amount of time to complete. This is because the GHK simulator needs to be invoked at every observed data point, each time using all draws from the posterior, thus leading to an immense number of computations. Examples Example 1 myData = import("$BayESHOME/Datasets/dataset7.csv");
myData.constant = ones(rows(myData), 1);

cprobit( y ~ z w v | constant );

Example 2

myData = import("$BayESHOME/Datasets/dataset7.csv"); myData.constant = ones(rows(myData), 1); myModel = cprobit( y ~ z w v | constant x1 x2 x3 x4, "m"=ones(3+2*5,1), "P"=0.01*eye(3+2*5,3+2*5), "n"=10, "V"=eye(2,2), "burnin"=20000, "draws"=40000, "thin"=4, "chains"=2, "logML_CJ" = true ); diagnostics("model"=myModel); kden(myModel.z, "title" = "delta1"); kden(myModel.y_2$x3, "title" = "beta3 for the 2nd alternative");

margeff_mean = mfx("point"="mean","model"=myModel);
margeff_median = mfx("point"="median","model"=myModel);
x_for_mfx = [
0.0,0.0,0.05,         // z, w, v for the base alternative
1.0,1.1,0.16,         // z, w, v for the 1st alternative
1.0,1.0,0.14,         // z, w, v for the 2nd alternative
1.0,1.0,0.5,2.0,0.0   // x variables
];
margeff_atx = mfx("point"=x_for_mfx,"model"=myModel);

predict();

7Optional arguments are always given in option-value pairs (eg. "chains"=3).