Bayesian Econometrics Software

5.4 Random-coeﬃcients stochastic frontier

Mathematical representation

y_{i t} = z_{i t}^{'} γ_{i} + x_{i t}^{'} β + v_{i t} \pm u_{i t}, v_{i t} \sim N (0, \frac{1}{τ}), γ_{i} \sim N (\bar{γ}, Ω^{- 1}), u_{i t} \sim D (𝜃)

(5.4)

the model is estimated using observations from $N$ groups, each group observed for $T_{i}$ periods (balanced or unbalanced panels); the total number of observations is $\sum_{i = 1}^{N} T_{i}$ and $T_{i}$ could be equal to one for all $i$ (cross-sectional data)
$y_{i t}$ is the value of the dependent variable for group $i$ , observed in period $t$
$z_{i t}$ is a $K \times 1$ vector that stores the values of the $K$ independent variables which are associated with group-speciﬁc coeﬃcients, for group $i$ , observed in period $t$
$x_{i t}$ is an $L \times 1$ vector that stores the values of the $L$ independent variables which are associated with coeﬃcients common to all groups, for group $i$ , observed in period $t$ ( $L$ could be zero)
$γ_{i}$ is a $K \times 1$ vector of parameters associated with group $i$
$\bar{γ}$ is a $K \times 1$ vector of parameters that represents the mean of the $γ_{i}$ s
$Ω$ is a $K \times K$ precision matrix for the distribution of the $γ_{i}$ s
$β$ is an $L \times 1$ vector of parameters
$τ$ is the precision of the noise component of the error term: $σ_{v}^{2} = \frac{1}{τ}$
uit is the ineﬃciency component of the error term for group i in period t and it can have any non-negative distribution, represented in the equation above by D 𝜃; BayES supports the following distributions for uit:
- exponential: $p (u_{i t}) = λ e^{- λ u_{i t}}$
- half normal: $p (u_{i t}) = \frac{2 ϕ^{1 ∕ 2}}{{(2 π)}^{1 ∕ 2}} exp \{- \frac{ϕ}{2} u_{i t}^{2}\}$

When

u_{i t}

enters the speciﬁcation with a plus sign then the model represents a cost frontier, while when

u_{i t}

enters with a minus sign the model represents a production frontier. For the eﬃciency scores generated by a stochastic frontier model to be meaningful, the dependent variable in both cases must be in logarithms.

No time dependence is imposed on the ineﬃciency component of the error term: each

u_{i t}

is treated as an independent draw from

D (𝜃)

Calculation of the log-marginal likelihood for the model is performed by integrating-out the

u_{i t}

s from the complete-data likelihood by simulation. BayES multiplies the number of draws used for the estimation of the parameters by the maximum number of time observations per group (

max_{i} \{T_{i}\}

) to determine the number of draws to be used for this integration. However, approximation of an integral of dimension

T_{i}

for each group,

i

, by simulation may be imprecise if

T_{i}

is large. Therefore, using a large number of iterations is equired to reduce the Monte Carlo standard error associated with the value of the log-marginal likelihood.

Priors


Parameter	Probability density function	Default hyperparameters

Common to all models
$\bar{γ}$	$p (\bar{γ}) = \frac{\| P_{γ} \|^{1 ∕ 2}}{{(2 π)}^{K ∕ 2}} exp \{- \frac{1}{2} {(\bar{γ} - m_{γ})}^{'} P_{γ} (\bar{γ} - m_{γ})\}$	$m_{γ} = 0_{K}$ , $P_{γ} = 0.001 \cdot I_{K}$
$Ω$	$p (Ω) = \frac{\| Ω \|^{\frac{n - K - 1}{2}} \| V^{- 1} \|^{n ∕ 2}}{2^{n K ∕ 2} Γ_{K} (\frac{n}{2})} exp \{- \frac{1}{2} tr (V^{- 1} Ω)\}$	$n = K^{2}$ , $V = \frac{100}{K} \cdot I_{K}$
$β$	$p (β) = \frac{\| P_{β} \|^{1 ∕ 2}}{{(2 π)}^{L ∕ 2}} exp \{- \frac{1}{2} {(β - m_{β})}^{'} P_{β} (β - m_{β})\}$	$m_{β} = 0_{L}$ , $P_{β} = 0.001 \cdot I_{L}$
$τ$	$p (τ) = \frac{b_{τ}^{a_{τ}}}{Γ (a_{τ})} τ^{a_{τ} - 1} e^{- τ b_{τ}}$	$a_{τ} = 0.001$ , $b_{τ} = 0.001$
Exponential model
$λ$	$p (λ) = \frac{b_{λ}^{a_{λ}}}{Γ (a_{λ})} λ^{a_{λ} - 1} e^{- λ b_{λ}}$	$a_{λ} = 1$ , $b_{λ} = 0.15$
Half normal model
$ϕ$	$p (ϕ) = \frac{b_{ϕ}^{a_{ϕ}}}{Γ (a_{ϕ})} ϕ^{a_{ϕ} - 1} e^{- ϕ b_{ϕ}}$	$a_{ϕ} = 7$ , $b_{ϕ} = 0.5$

Syntax

[

<model name> =

]

sf_rc( y ~ z1 z2 … zK

[

|

x1 x2

\dots

]

[

, <options>

]

);

where:

y is the dependent variable name, as it appears in the dataset used for estimation
z1 z2 $\dots$ zK is a list of the names, as they appear in the dataset used for estimation, of the independent variables which are associated with group-speciﬁc coeﬃcients; when a constant term is to be included in the set of group-speciﬁc coeﬃcients this must be requested explicitly
x1 x2 $\dots$ xL is is a list of the names, as they appear in the dataset used for estimation, of the independent variables which are associated with coeﬃcients common to all groups; when a constant term is to be included in the set of common coeﬃcients, this must be requested explicitly

An independent variable could be included in either the x or the z variable list, depending on whether the parameter associated with this variable is common to all groups or not. However, including a variable in both lists would lead to exact multicollinearity and, in this case, BayES will issue an error.

Before using the sf_rc() function the dataset used for estimation must be declared as a panel dataset using the set_pd() function (see section B.13). In the case of cross-sectional data, the dataset still needs to be declared as a panel, but the group-id variable could be constructed as a list of unique integers using, for example, the range() function.

For groups observed only once, a group-speciﬁc parameter associated with a constant term cannot be distinguished from the error term (

v_{i t}

). Thus, a warning is produced when a constant term is included in the z list and the dataset contains at least one group which is observed only once.

The optional arguments for the random-coeﬃcients stochastic frontier model are:⁴

Gibbs parameters

"chains"	number of chains to run in parallel (positive integer); the default value is 1
"burnin"	number of burn-in draws per chain (positive integer); the default value is 10000
"draws"	number of retained draws per chain (positive integer); the default value is 20000
"thin"	value of the thinning parameter (positive integer); the default value is 1
"seed"	value of the seed for the random-number generator (positive integer); the default value is 42
Model specification

"udist"	speciﬁcation of the distribution of the ineﬃciency component of the error term; the following options are available, corresponding to the distributions presented at the beginning of this section: "exp" "hnorm" the default value is "exp"
"production"	boolean specifying the type of frontier (production/cost); it could be set to either true (production) or false (cost); the default value is true
Hyperparameters

Common to all models
"m_gamma"	mean vector of the prior for $\bar{γ}$ ( $K \times 1$ vector); the default value is $0_{K}$
"P_gamma"	precision matrix of the prior for $\bar{γ}$ ( $K \times K$ symmetric and positive-deﬁnite matrix); the default value is $0.001 \cdot I_{K}$
"V"	scale matrix of the prior for $Ω$ ( $K \times K$ symmetric and positive-deﬁnite matrix); the default value is $\frac{100}{K} \cdot I_{K}$
"n"	degrees-of-freedom parameter of the prior for $Ω$ (real number greater than or equal to $K$ ); the default value is $K^{2}$
"m_beta"	mean vector of the prior for $β$ ( $L \times 1$ vector); the default value is $0_{L}$
"P_beta"	precision matrix of the prior for $β$ ( $L \times L$ symmetric and positive-deﬁnite matrix); the default value is $0.001 \cdot I_{L}$
"a_tau"	shape parameter of the prior for $τ$ (positive number); the default value is $0.001$
"b_tau"	rate parameter of the prior for $τ$ (positive number); the default value is $0.001$
Exponential model
"a_lambda"	shape parameter of the prior for $λ$ (positive number); the default value is $1$
"b_lambda"	rate parameter of the prior for $λ$ (positive number); the default value is $0.15$
Half normal model
"a_phi"	shape parameter of the prior for $ϕ$ (positive number); the default value is $7$
"b_phi"	rate parameter of the prior for $ϕ$ (positive number); the default value is $0.5$
Dataset and log-marginal likelihood

"dataset"	the id value of the dataset that will be used for estimation; the default value is the ﬁrst dataset in memory (in alphabetical order)
"logML_CJ"	boolean indicating whether the Chib (1995)/Chib & Jeliazkov (2001) approximation to the log-marginal likelihood should be calculated (true $\|$ false); the default value is false

Reported Parameters


Common to all models

$\bar{γ}$	variable_name	vector of parameters associated with the independent variables in the z list; these are the means of the group-speciﬁc parameters

$β$	variable_name	vector of parameters associated with the independent variables in the x list

$τ$	tau	precision parameter of the noise component of the error term, $v_{i}$

$σ_{v}$	sigma_v	standard deviation of the noise component of the error term, $σ_{v} = 1 ∕ τ^{1 ∕ 2}$

Exponential model

$λ$	lambda	rate parameter of the distribution of the ineﬃciency component of the error term, $u_{i}$

$σ_{u}$	sigma_u	scale parameter of the ineﬃciency component of the error term: $σ_{u} = 1 ∕ λ$ . For the exponential model the standard deviation of $u_{i}$ is equal to the scale parameter.

Half normal model

$ϕ$	phi	precision parameter of the distribution of the ineﬃciency component of the error term, $u_{i}$

$σ_{u}$	sigma_u	scale parameter of the ineﬃciency component of the error term: $σ_{u} = 1 ∕ ϕ^{1 ∕ 2}$ . The standard deviation of $u_{i}$ for the half-normal model can be obtained as $σ_{u} \sqrt{1 - \frac{2}{π}}$ .

Stored values and post-estimation analysis
If a left-hand-side id value is provided when a random-coeﬃcients stochastic frontier model is created, then the following results are saved in the model item and are accessible via the ‘.’ operator:

Samples	a matrix containing the draws from the posterior of $\bar{γ}$ , $β$ , $τ$ , the unique elements of $Ω$ and either $λ$ (exponential model) or $ϕ$ (half-normal model)
z1, $\dots$ ,zK	vectors containing the draws from the posterior of the mean of the group-speciﬁc coeﬃcients ( $\bar{γ}$ s) associated with variables z1, $\dots$ ,zK (the names of these vectors are the names of the variables that were included in the right-hand side of the model)
x1, $\dots$ ,xL	vectors containing the draws from the posterior of the parameters associated with variables x1, $\dots$ ,xL (the names of these vectors are the names of the variables that were included in the right-hand side of the model)
tau	vector containing the draws from the posterior of $τ$
lambda	vector containing the draws from the posterior of $λ$ (available after the estimation of the exponential model)
phi	vector containing the draws from the posterior of $ϕ$ (available after the estimation of the half-normal model)
Omega_i_j	vectors containing the draws from the posterior of the unique elements of $Ω$ ; because $Ω$ is symmetric, only $\frac{(K - 1) K}{2} + K$ of its elements are stored (instead of all $K^{2}$ elements); i and j index the row and column of $Ω$ , respectively, at which the corresponding element is located
Omega	$K \times K$ matrix that stores the posterior mean of $Ω$
logML	the Lewis & Raftery (1997) approximation of the log-marginal likelihood
logML_CJ	the Chib (1995)/Chib & Jeliazkov (2001) approximation to the log-marginal likelihood; this is available only if the model was estimated with the "logML_CJ"=true option
gamma_i	$N \times K$ matrix that stores the group-speciﬁc coeﬃcients for the variables in the z list; the values in this matrix are not guaranteed to be in the same order as the order in which the groups appear in the dataset used for estimation; use the store() function to associate the values in gamma_i with the observations in the dataset
eff_i	$(\sum_{i = 1}^{N} T_{i}) \times 1$ vector that stores the expected values of the observation-speciﬁc eﬃciency scores, $E (e^{- u_{i t}})$ ; the values in this vector are not guaranteed to be in the same order as the order in which the observations appear in the dataset used for estimation; use the store() function to associate the values in eff_i with the observations in the dataset
nchains	the number of chains that were used to estimate the model
nburnin	the number of burn-in draws per chain that were used when estimating the model
ndraws	the total number of retained draws from the posterior ( $=$ chains $\cdot$ draws)
nthin	value of the thinning parameter that was used when estimating the model
nseed	value of the seed for the random-number generator that was used when estimating the model

Additionally, the following functions are available for post-estimation analysis (see section B.14):

diagnostics()
test()
pmp()
store()

The random-coeﬃcients stochastic frontier model uses the store() function to associate the group-speciﬁc parameters (gamma_i) or the estimates of the eﬃciency scores (eff_i) with speciﬁc observations and store their values in the dataset used for estimation. The generic syntax for a statement involving the store() function after estimation of a random-coeﬃcients stochastic frontier model and for each of these two quantities is:

store( gamma_i, <new variable name prefix>

[

, "model"=<model name>

]

);

and:

store( eff_i, <new variable name>

[

, "model"=<model name>

]

);

The ﬁrst statement will generate $K$ additional variables in the dataset used for estimation of the random-coeﬃcients model, with names constructed by prepending the preﬁx provided as the second argument to store() to the names of the variables which are associated with group-speciﬁc coeﬃcients. The second statement will generate one additional variable in the dataset used for estimation of the model and its name will be the one provided as the second argument to store().

Examples

Example 1

Example 2

myData = import("$BayESHOME/Datasets/dataset2.csv", ",");
myData.constant = ones(rows(myData), 1);
set_pd( year, id, "dataset" = myData);

// only the constant term and the coefficient associated with x1 are
// group-specific; the coefficients on x2 and x3 are common to all groups
sf_rc( y ~ constant x1

|

x2 x3);

Example 3

myData = import("$BayESHOME/Datasets/dataset2.csv", ",");
myData.constant = ones(rows(myData), 1);
set_pd( year, id, "dataset" = myData);

// all rhs variables are associated with group-specific coefficients
model1 = sf_rc(y ~ constant x1 x2 x3, "udist"="hnorm",
"burnin"=10000, "draws"=40000, "thin"=4, "chains"=2,
"logML_CJ" = true, "dataset"=myData);
store( gamma_i, rc1_, "model" = model1 );

// only the constant term and the coefficient associated with x1 are
// group-specific; the coefficients on x2 and x3 are common to all groups
model2 = lm_rc( y ~ constant x1

|

x2 x3, "udist"="hnorm",
"burnin"=10000, "draws"=40000, "thin"=4, "chains"=2,
"logML_CJ" = true, "dataset"=myData);
store( gamma_i, rc2_, "model" = model2 );

pmp( { model1, model2 } );

⁴Optional arguments are always given in option-value pairs (eg. "chains"=3).

[next] [prev] [prev-tail] [front] [up]