Bayesian Econometrics Software

5.7 Dynamic stochastic frontier

Mathematical representation

\begin{array}{l} y_{i t} = x_{i t}^{'} β + v_{i t} \pm u_{i t}, & v_{i t} \sim N (0, \frac{1}{τ}), & (5.7) \\ s_{i t} = f (u_{i t}) & (5.8) \\ s_{i t} = z_{i t}^{'} δ + ρ s_{i, t - 1} + ξ_{i t}, & ξ_{i t} \sim N (0, \frac{1}{ϕ}) & (5.9) \\ s_{i 1} = \frac{z_{i 1}^{'} δ}{1 - ρ} + ξ_{i 1}, & ξ_{i 1} \sim N (0, \frac{1}{ϕ {(1 - ρ)}^{2}}) & (5.10) \end{array}

the model is estimated using observations from $N$ groups, each group observed for $T_{i}$ periods (balanced or unbalanced panels); the total number of observations is $\sum_{i = 1}^{N} T_{i}$
$y_{i t}$ is the value of the dependent variable for group $i$ , observed in period $t$
$x_{i t}$ is a $K \times 1$ vector that stores the values of the $K$ independent variables for group $i$ , observed in period $t$
$β$ is a $K \times 1$ vector of parameters associated with the variables in the observed equation
$τ$ is the precision of the noise component of the error term in the observed equation: $σ_{v}^{2} = \frac{1}{τ}$
$u_{i}$ is the ineﬃciency component of the error term
sit is the value of the hidden-state variable for group i, period t; this hidden state is a monotonic transformation of the ineﬃciency component of the error term: sit = f uit and controls the autoregressive process of eﬃciency; BayES supports the following transformations:
- $s_{i t} = log (\frac{e^{- u_{i t}}}{1 - e^{- u_{i t}}})$ as presented in Emvalomatis (2011); in this formulation the eﬃciency score, $e^{- u_{i t}}$ , follows, conditional on $s_{i, t - 1}$ , a logit-normal distribution
- $s_{i t} = log (u_{i t})$ as presented in Tsionas (2006); in this formulation $u_{i t}$ follows, conditional on $s_{i, t - 1}$ , a log-normal distribution
$z_{i t}$ is an $L \times 1$ vector that stores the values of the $L$ determinants of ineﬃciency, as they enter equation (5.9), for group $i$ , observed in period $t$
$δ$ is an $L \times 1$ vector of parameters associated with the variables in the hidden-state equation
$ρ$ is a parameter that measures the persistence of ineﬃciency
$ϕ$ is the precision of the error term in the hidden-state equation: $σ_{ξ}^{2} = \frac{1}{ϕ}$

When

u_{i}

enters the speciﬁcation with a plus sign then the model represents a cost frontier, while when

u_{i}

enters with a minus sign the model represents a production frontier. For the eﬃciency scores generated by a stochastic frontier model to be meaningful, the dependent variable in both cases must be in logarithms.

Due to Metropolis-Hastings updates used by BayES in the estimation of dynamic stochastic frontier models, the draws from the posterior are likely to have very large autocorrelation times. Therefore, long burn-ins are recommended (above 30,000 draws) and large thinning parameters if machine memory is an issue.

Priors


Parameter	Probability density function	Default hyperparameters

$β$	$p (β) = \frac{\| P_{β} \|^{1 ∕ 2}}{{(2 π)}^{K ∕ 2}} exp \{- \frac{1}{2} {(β - m_{β})}^{'} P_{β} (β - m_{β})\}$	$m_{β} = 0_{K}$ , $P_{β} = 0.001 \cdot I_{K}$
$τ$	$p (τ) = \frac{b_{τ}^{a_{τ}}}{Γ (a_{τ})} τ^{a_{τ} - 1} e^{- τ b_{τ}}$	$a_{τ} = 0.001$ , $b_{τ} = 0.001$
$δ$	$p (δ) = \frac{\| P_{δ} \|^{1 ∕ 2}}{{(2 π)}^{L ∕ 2}} exp \{- \frac{1}{2} {(δ - m_{δ})}^{'} P_{δ} (δ - m_{δ})\}$	$m_{δ} = 0_{L}$ , $P_{δ} = 0.001 \cdot I_{L}$
$ρ$	$p (ρ) = \frac{ρ^{a_{ρ} - 1} (1 - ρ) ρ^{b_{ρ} - 1}}{B (a_{ρ}, b_{ρ})}$	$a_{ρ} = 4.0$ , $b_{ρ} = 2.0$
$ϕ$	$p (ϕ) = \frac{b_{ϕ}^{a_{ϕ}}}{Γ (a_{ϕ})} ϕ^{a_{ϕ} - 1} e^{- ϕ b_{ϕ}}$	$a_{ϕ} = 0.1$ , $b_{ϕ} = 0.01$

Syntax

[

<model name> =

]

sf_dyn( y ~ x1 x2

\dots

|

z1 z2

\dots

[

, <options>

]

);

where:

y is the dependent variable name, as it appears in the dataset used for estimation
x1 x2 $\dots$ xK is a list of the $K$ independent variable names, as they appear in the dataset used for estimation; when a constant term is to be included in the model, this must be requested explicitly
z1 z2 $\dots$ zL is a list of the $L$ variable names that enter the hidden-state equation, as they appear in the dataset used for estimation; when a constant term is to be included in the model, this must be requested explicitly; it is possible to run a model with an empty z list, however, it is recommended that at least a constant term is included

Groups that contain observations which are not consecutive according to the panel time variable (for example, a group is observed for two consecutive periods, not observed for the following period and observed again for another string of consecutive time periods) are split into multiple groups, with each string of consecutive observations treated as a diﬀerent group. A warning is produced when the dataset used for estimation contains groups with gaps in the time dimension.

The optional arguments for the dynamic stochastic frontier model are:⁹

Gibbs parameters

"chains"	number of chains to run in parallel (positive integer); the default value is 1
"burnin"	number of burn-in draws per chain (positive integer); the default value is 10000
"draws"	number of retained draws per chain (positive integer); the default value is 20000
"thin"	value of the thinning parameter (positive integer); the default value is 1
"seed"	value of the seed for the random-number generator (positive integer); the default value is 42
Model specification

"udist"	speciﬁcation of the distribution of the ineﬃciency component of the error term; the following options are available, corresponding to the distributions presented at the beginning of this section: "explogitn" "logn" the default value is "explogitn"
"production"	boolean specifying the type of frontier (production/cost); it could be set to either true (production) or false (cost); the default value is true
Hyperparameters

"m_beta"	mean vector of the prior for $β$ ( $K \times 1$ vector); the default value is $0_{K}$
"P_beta"	precision matrix of the prior for $β$ ( $K \times K$ symmetric and positive-deﬁnite matrix); the default value is $0.001 \cdot I_{K}$
"a_tau"	shape parameter of the prior for $τ$ (positive number); the default value is $0.001$
"b_tau"	rate parameter of the prior for $τ$ (positive number); the default value is $0.001$
"m_delta"	mean vector of the prior for $δ$ ( $L \times 1$ vector); the default value is $0_{L}$
"P_delta"	precision matrix of the prior for $δ$ ( $L \times L$ symmetric and positive-deﬁnite matrix); the default value is $0.001 \cdot I_{L}$
"a_rho"	alpha parameter of the prior for $ρ$ (positive number); the default value is $4$
"b_rho"	beta parameter of the prior for $ρ$ (positive number); the default value is $2$
"a_phi"	shape parameter of the prior for $ϕ$ (positive number); the default value is $0.1$
"b_phi"	rate parameter of the prior for $ϕ$ (positive number); the default value is $0.01$
Dataset and log-marginal likelihood

"dataset"	the id value of the dataset that will be used for estimation; the default value is the ﬁrst dataset in memory (in alphabetical order)
"logML_CJ"	boolean indicating whether the Chib (1995)/Chib & Jeliazkov (2001) approximation to the log-marginal likelihood should be calculated (true $\|$ false); the default value is false

Reported Parameters


$β$	variable_name	vector of parameters associated with the independent variables in the x list

$τ$	tau	precision parameter of the noise component of the error term, $v_{i}$

$δ$	variable_name	vector of parameters associated with the independent variables in the z list

$ϕ$	phi	precision parameter of the error term in the hidden-state equation of the error term, $u_{i}$

$σ_{v}$	sigma_v	standard deviation of the noise component of the error term, $σ_{v} = 1 ∕ τ^{1 ∕ 2}$

$σ_{s}$	sigma_s	standard deviation of the error term in the hidden-state equation: $σ_{α} = 1 ∕ ϕ^{1 ∕ 2}$

Stored values and post-estimation analysis
If a left-hand-side id value is provided when a dynamic stochastic frontier model is created, then the following results are saved in the model item and are accessible via the ‘.’ operator:

Samples	a matrix containing the draws from the posterior of $β$ , $τ$ , $δ$ , $ρ$ and $ϕ$
y$x1, $\dots$ ,y$xK	vectors containing the draws from the posterior of the parameters associated with variables x1, $\dots$ ,xK (the names of these vectors are the names of the variables that were included in the right-hand side of the model, prepended by y$, where y is the name of the dependent variable; this is done so that the samples on the parameters associated with a variable that appears in both x and z lists can be distinguished)
tau	vector containing the draws from the posterior of $τ$
s$z1, $\dots$ ,s$zL	vectors containing the draws from the posterior of the parameters associated with variables z1, $\dots$ ,zL (the names of these vectors are the names of the variables that were included in the z list, in the right-hand side of the model, prepended by s$; this is done so that the samples on the parameters associated with a variable that appears in both x and z lists can be distinguished)
rho	vector containing the draws from the posterior of $ρ$
phi	vector containing the draws from the posterior of $ϕ$ (available after the estimation of the truncated-normal model)
logML	the Lewis & Raftery (1997) approximation of the log-marginal likelihood
logML_CJ	the Chib (1995)/Chib & Jeliazkov (2001) approximation to the log-marginal likelihood; this is available only if the model was estimated with the "logML_CJ"=true option
eff_i	$N \times 1$ vector that stores the expected values of the observation-speciﬁc eﬃciency scores, $E (e^{- u_{i}})$ ; the values in this vector are not guaranteed to be in the same order as the order in which the observations appear in the dataset used for estimation; use the store() function to associate the values in eff_i with the observations in the dataset
nchains	the number of chains that were used to estimate the model
nburnin	the number of burn-in draws per chain that were used when estimating the model
ndraws	the total number of retained draws from the posterior ( $=$ chains $\cdot$ draws)
nthin	value of the thinning parameter that was used when estimating the model
nseed	value of the seed for the random-number generator that was used when estimating the model

Additionally, the following functions are available for post-estimation analysis (see section B.14):

diagnostics()
test()
pmp()
store()
mfx()

The dynamic stochastic frontier model uses the store() function to associate the estimates of the eﬃciency scores (eff_i) with speciﬁc observations and store their values in the dataset used for estimation. The generic syntax for a statement involving the store() function after estimation of a dynamic stochastic frontier model is:

store( eff_i, <new variable name>

[

, "model"=<model name>

]

);

Examples

Example 1

myData = import("$BayESHOME/Datasets/dataset1.csv", ",");
myData.constant = ones(rows(myData), 1);
set_pd( year, id, "dataset" = myData);

explogitnSF = sf_dyn( y ~ constant x1 x2 x3 | constant z2 );

Example 2

myData = import("$BayESHOME/Datasets/dataset1.csv", ",");
myData.constant = ones(rows(myData), 1);
set_pd( year, id, "dataset" = myData);

explogitnSF = sf_dyn( y ~ constant x1 x2 x3 | constant z2,
"udist" = "explogitn" );

lognSF = sf_dyn( y ~ constant x1 x2 x3 | constant z2,
"udist" = "logn" );

store( eff_i, eff_explogitn, "model" = explogitnSF );
store( eff_i, eff_logn, "model" = lognSF );

pmp( { explogitnSF, lognSF } );

⁹Optional arguments are always given in option-value pairs (eg. "chains"=3).

[next] [prev] [prev-tail] [front] [up]