B.13 Statements for working with datasets
For all practical purposes datasets in BayES are matrices with additional structure. This means that if, for example, D is a dataset, then indexing operations and functions operating on matrices work on D in the same way as if D were a matrix. There are, however, some additional functions and statements that work on datasets, but not on matrices. These are documented in the following table.
Syntax  Arguments and performed function 
X = D.varname;  X is a column vector with entries equal to the values of variable varname in dataset D.

D.varname = <math expression>;  Creates a new variable called varname (deﬁned by <math expression>) and adds it to dataset D. If D already has a variable called varname then its values are replaced.

clear(D.varname);  Deletes the variable called varname from dataset D.

D = dataset(A $[$, {ID1, ID2, ...}$]$);  D is a dataset constructed by the data contained in matrix A. ID1, ID2, ... is a list of id values (id values inside curly brackets) to be used as the variable names. If variable names are not provided then the variables are named _V1, _V2, etc.
see also import 
rename(D.oldname, newname);  Renames variable oldname in dataset D to newname.

keepif(<condition> $[$,"dataset"=D$]$);  Keeps the observations in dataset D that satisfy the logical <condition>. The remaining observations (those that do not satisfy <condition>) are permanently deleted from D. If D is not provided then the function operates on the ﬁrst dataset available in the current workspace. The statement has no return value and the data in D are overwritten.
see also dropif and dropmissing 
dropif(<condition> $[$,"dataset"=D$]$);  Drops (permanentlydeletes) the observations in dataset D that satisfy the logical <condition>. The remaining observations (those that do not satisfy <condition>) are retained. If D is not provided then the function operates on the ﬁrst dataset available in the current workspace. The statement has no return value and the data in D are overwritten.
see also keepif and dropmissing 
W = dropmissing(X);  W is a dataset constructed by reading the entries of X, row by row, but skipping any rows in X that contain at least one missing value. An error is produced if an empty dataset results from dropping the rows of X with missing values.
When the argument provided to dropmissing() is a matrix then the function returns a matrix. Therefore, this function is also documented in Section B.5. see also dropif and keepif 
sortd(<variables> $[$,"dataset"=D$]$);  Sorts the data in dataset D in ascending order according to the values of the variables, the names of which are provided in the <variables> list. If D is not provided then the function operates on the ﬁrst dataset available in the current workspace. The statement has no return value and the data in D are overwritten.
see also sort and sortrows 
summary(<variables> $[$,"dataset"=D$]$);  Calculates and prints summary statistics of the variables in dataset D. If D is not provided then the function operates on the ﬁrst dataset available in the current workspace.

set_ts(tid  Declares the dataset as time series, with tid being the variable that identiﬁes time periods. If D is not provided then the function operates on the ﬁrst dataset available in the current workspace.

set_pd(tid, pid
 Declares the dataset as a panel,with tid being the variable that identiﬁes time periods and pid the variable that identiﬁes groups. If D is not provided then the function operates on the ﬁrst dataset available in the current workspace.

set_cs($[$"dataset"=D$]$);  Clears any time structure from dataset D, that was set by a call to either the set_ts or set_pd functions. If D is not provided then the function operates on the ﬁrst dataset available in the current workspace.

X = lag(varname $[$,l,"dataset"=D$]$);  X is a column vector obtained by taking lags of length l on variable with name varname from dataset D. If D is not provided then the function operates on the ﬁrst dataset available in the current workspace.

X = diﬀ(varname  X is a column vector obtained by taking seasonal diﬀerences of order o and seasonal length l on variable with name varname from dataset D. If D is not provided then the function operates on the ﬁrst dataset available in the current workspace.

X = groupmeans(varname  X is a column vector obtained by calculating the arithmetic mean per group of the variable with name varname from dataset D. X has the same length as the number of observations in D and missing values are generated if varname has only missing values for a particular group. If D is not provided then the function operates on the ﬁrst dataset available in the current workspace.

X = groupvars(varname  X is a column vector obtained by calculating the variance per group of the variable with name varname from dataset D. X has the same length as the number of observations in D and missing values are generated if varname has only missing values for a particular group. If D is not provided then the function operates on the ﬁrst dataset available in the current workspace.

X = groupsds(varname  X is a column vector obtained by calculating the standard deviation per group of the variable with name varname from dataset D. X has the same length as the number of observations in D and missing values are generated if varname has only missing values for a particular group. If D is not provided then the function operates on the ﬁrst dataset available in the current workspace.

X = groupmedians(varname  X is a column vector obtained by calculating the median per group of the variable with name varname from dataset D. X has the same length as the number of observations in D and missing values are generated if varname has only missing values for a particular group. If D is not provided then the function operates on the ﬁrst dataset available in the current workspace.

X = groupsums(varname  X is a column vector obtained by calculating the sum per group of the variable with name varname from dataset D. X has the same length as the number of observations in D and missing values are generated if varname has only missing values for a particular group. If D is not provided then the function operates on the ﬁrst dataset available in the current workspace.

X = groupcounts(varname  X is a column vector obtained by calculating the number of observations per group of the variable with name varname from dataset D. X has the same length as the number of observations in D and the number of observations per group excludes any observations with missing values on varname. If D is not provided then the function operates on the ﬁrst dataset available in the current workspace.
