B.13 Statements for working with datasets
For all practical purposes datasets in BayES are matrices with additional structure. This means that if, for example, D is a dataset, then indexing operations and functions operating on matrices work on D in the same way as if D were a matrix. There are, however, some additional functions and statements that work on datasets, but not on matrices. These are documented in the following table.
Syntax | Arguments and performed function |
X = D.varname; | X is a column vector with entries equal to the values of variable varname in dataset D.
|
D.varname = <math expression>; | Creates a new variable called varname (defined by <math expression>) and adds it to dataset D. If D already has a variable called varname then its values are replaced.
|
clear(D.varname); | Deletes the variable called varname from dataset D.
|
D = dataset(A , {ID1, ID2, ...}); | D is a dataset constructed by the data contained in matrix A. ID1, ID2, ... is a list of id values (id values inside curly brackets) to be used as the variable names. If variable names are not provided then the variables are named _V1, _V2, etc.
see also import |
rename(D.oldname, newname); | Renames variable oldname in dataset D to newname.
|
keepif(<condition> ,"dataset"=D); | Keeps the observations in dataset D that satisfy the logical <condition>. The remaining observations (those that do not satisfy <condition>) are permanently deleted from D. If D is not provided then the function operates on the first dataset available in the current workspace. The statement has no return value and the data in D are overwritten.
see also dropif and dropmissing |
dropif(<condition> ,"dataset"=D); | Drops (permanentlydeletes) the observations in dataset D that satisfy the logical <condition>. The remaining observations (those that do not satisfy <condition>) are retained. If D is not provided then the function operates on the first dataset available in the current workspace. The statement has no return value and the data in D are overwritten.
see also keepif and dropmissing |
W = dropmissing(X); | W is a dataset constructed by reading the entries of X, row by row, but skipping any rows in X that contain at least one missing value. An error is produced if an empty dataset results from dropping the rows of X with missing values.
When the argument provided to dropmissing() is a matrix then the function returns a matrix. Therefore, this function is also documented in Section B.5. see also dropif and keepif |
sortd(<variables> ,"dataset"=D); | Sorts the data in dataset D in ascending order according to the values of the variables, the names of which are provided in the <variables> list. If D is not provided then the function operates on the first dataset available in the current workspace. The statement has no return value and the data in D are overwritten.
see also sort and sortrows |
summary(<variables> ,"dataset"=D); | Calculates and prints summary statistics of the variables in dataset D. If D is not provided then the function operates on the first dataset available in the current workspace.
|
set_ts(tid | Declares the dataset as time series, with tid being the variable that identifies time periods. If D is not provided then the function operates on the first dataset available in the current workspace.
|
set_pd(tid, pid
| Declares the dataset as a panel,with tid being the variable that identifies time periods and pid the variable that identifies groups. If D is not provided then the function operates on the first dataset available in the current workspace.
|
set_cs("dataset"=D); | Clears any time structure from dataset D, that was set by a call to either the set_ts or set_pd functions. If D is not provided then the function operates on the first dataset available in the current workspace.
|
X = lag(varname ,l,"dataset"=D); | X is a column vector obtained by taking lags of length l on variable with name varname from dataset D. If D is not provided then the function operates on the first dataset available in the current workspace.
|
X = diff(varname | X is a column vector obtained by taking seasonal differences of order o and seasonal length l on variable with name varname from dataset D. If D is not provided then the function operates on the first dataset available in the current workspace.
|
X = groupmeans(varname | X is a column vector obtained by calculating the arithmetic mean per group of the variable with name varname from dataset D. X has the same length as the number of observations in D and missing values are generated if varname has only missing values for a particular group. If D is not provided then the function operates on the first dataset available in the current workspace.
|
X = groupvars(varname | X is a column vector obtained by calculating the variance per group of the variable with name varname from dataset D. X has the same length as the number of observations in D and missing values are generated if varname has only missing values for a particular group. If D is not provided then the function operates on the first dataset available in the current workspace.
|
X = groupsds(varname | X is a column vector obtained by calculating the standard deviation per group of the variable with name varname from dataset D. X has the same length as the number of observations in D and missing values are generated if varname has only missing values for a particular group. If D is not provided then the function operates on the first dataset available in the current workspace.
|
X = groupmedians(varname | X is a column vector obtained by calculating the median per group of the variable with name varname from dataset D. X has the same length as the number of observations in D and missing values are generated if varname has only missing values for a particular group. If D is not provided then the function operates on the first dataset available in the current workspace.
|
X = groupsums(varname | X is a column vector obtained by calculating the sum per group of the variable with name varname from dataset D. X has the same length as the number of observations in D and missing values are generated if varname has only missing values for a particular group. If D is not provided then the function operates on the first dataset available in the current workspace.
|
X = groupcounts(varname | X is a column vector obtained by calculating the number of observations per group of the variable with name varname from dataset D. X has the same length as the number of observations in D and the number of observations per group excludes any observations with missing values on varname. If D is not provided then the function operates on the first dataset available in the current workspace.
|