Discrete Exponential Family Model (DEFM)

Discrete Exponential Family Models (DEFMs) are models from the exponential family that deal with discrete data. Here, we deal with binary arrays which can be used to represent, among other things, networks and multinomial binary Markov processes.

new_defm_cpp(id, Y, X, order = 1L, copy_data = TRUE)

init_defm(m, force_new = TRUE)

print_stats(m, i = 0L)

nterms_defm(m)

nrow_defm(m)

ncol_defm_y(m)

ncol_defm_x(m)

nobs_defm(m)

morder_defm(m)

new_defm(id, Y, X, order = 1)

Arguments

id: Integer vector of length n. Observation ids, for example, person id.
Y: 0/1 matrix of responses of n_y columns and n rows.
X: Numeric matrix of covariates of size n_x by n.
order: Integer. Order of the markov process, by default, 1.
copy_data: Logical scalar. When TRUE (default) will copy the data into the model, otherwise it will use the data as a pointer (see details).
m: An object of class DEFM.
force_new: Logical scalar. When TRUE (default) no cache is used to add new arrays (see details).
i: An integer scalar indicating which set of statistics to print (see details.)

Value

An external pointer of class DEFM.

nterms_defm returns the number of terms in the model.

nrow_defm returns the number of rows in the model.

ncol_defm_y returns the number of output variables in the model.

ncol_defm_x returns the number of covariates in the model.

nobs_defm returns the number of observations (events) in the model.

morder_defm returns the order of the Markov process.

An external pointer of class DEFM.

Details

The id vector is used to group the observations. For example, if you have a dataset with multiple individuals, the id vector should contain the individual ids. The Y matrix contains the binary responses, where each column represents a different response variable. The X matrix contains the covariates, which can be used to model the relationship between the responses and the covariates. The order parameter specifies the order of the Markov process, which determines how many previous observations are used to predict the current observation.

The copy_data parameter specifies whether the data should be copied into the model or used as a pointer. If copy_data is TRUE, the data will be copied into the model, which can be useful if you want to avoid duplicating the data in memory. If copy_data is FALSE, the model will use the data as a pointer, which can be more efficient (but dangerous if the data is removed).

The init_defm function initializes the model, which means it computes the sufficient statistics and prepares the model for fitting. The force_new parameter specifies whether to force the model to be consider each array added as completely unique, even if it has the same support set as an existing array. This is an experimental feature and should be used with caution.

The print_stats function prints the supportset of the ith type of array in the model.

References

Vega Yon, G. G., Pugh, M. J., & Valente, T. W. (2022). Discrete Exponential-Family Models for Multivariate Binary Outcomes (arXiv:2211.00627). arXiv. https://arxiv.org/abs/2211.00627

Examples