Expanding the ERGM Framework

Modeling Interrelated Health Outcomes with Jointly-Distributed Binary Data

Sunbelt 2025
Paris, France

George G. Vega Yon

Thomas W. Valente

Jacob Kean

Mary Jo Pugh

2025-06-26

Motivation

We are running a study with multiple 0/1 outcomes, e.g., person \(i\) in time \(t\)

\[ \left(\mbox{tobacco}, \mbox{alcohol}, \mbox{marijuana}\right)_{it} = \left(0, 1, 0\right) \]
We want to understand the factors that influence prevalence.
And the outcomes may not be independent.
Some approaches to model this type of data exists (e.g., SEM, multivariate regression, etc.) Here we leverage ERGMs!

Types of questions

While modeling outcomes jointly, we can ask questions such as:

Question	Representation
How prevalent is the co-occurrence?	\(\{A_i, B_i\}\)
Same as above, but for females?	\(\{A_i, B_i\} \times female_i\)
How common is the transition?	\(\{A_{i,t}, \text{not }B_{i,t}\} \to \{A_{i,t + 1}, B_{i, t + 1}\}\)
What about reversing?	\(\{A_{i,t}, B_{i,t}\} \to \{A_{i,t + 1},\text{not }B_{i, t + 1}\} \to \{A_{i,t + 2}, B_{i,t + 2}\}\)

Background

Exponential Random Graph Models [ERGMs]

The ERGM framework¹ has been used for a variety of applications.
Multiple ERGM extensions and other advances in modeling entities jointly exist. A few to highlight:
- Generalized Location Systems [GLS]².
- (multivariate)⁵ Auto-logistic Actor Attribute Models [ALAAM]³.
- Exponential-family Random Network Models [ERNMs]⁴.
- Latent models Gollini and Caimo (Sunbelt 2025).⁵
Our work is yet another extension: A bipartite network model where we map individuals to multiple outcomes.

ERGMs (cont.)

In case you missed them from earlier talks 🤭

A diagram showing the parts of an ERGM.

Data structure

For each individual \(i\), with \(T\) time points and \(K\) outcomes, we have a data structure that looks like this:

(A) Data structure \(T\) time points and \(K\) outcomes

\[ \begin{aligned} & \qquad\mbox{Outcomes }\rightarrow \\ \mbox{Time }\downarrow & \left[\begin{array}{cccc} y_{i,1,1} & y_{i,1,2} & \dots & y_{i,1,K} \\ y_{i,2,1} & y_{i,2,2} & \dots & y_{i,2,K} \\ \vdots & \vdots & \ddots & \vdots \\ y_{i,T,1} & y_{i,T,2} & \dots & y_{i,T,K} \end{array}\right] \end{aligned} \]

Each row represents a time point, and each column a particular outcome.

(B) Example with three outcomes and two time points

\[ \begin{aligned} \begin{array}{r} t = 1 \\ t = 2 \end{array} & \left[\begin{array}{ccc} \text{Alcohol} & \text{Tobacco} & \text{Marijuana} \\ 0 & 1 & 0 \\ 1 & 1 & 0 \end{array}\right] \end{aligned} \]

This case represents the transition from only consuming tobacco to consuming tobacco and alcohol.

Data structure (bis)

The full likelihood of the model (multilevel/pooled ERGM) is given by:

\[ {\mathbb{P}_{\mathcal{Y}}\left(\mathbf{Y_{}} = \mathbf{y_{{}}};\theta\right) } = \left[\prod_{i,t>1}% \overbrace{\mbox{exp}\left\{s\left(\mathbf{y_{{i,t}}}\right)\right\} \times \kappa\left(\boldsymbol{\theta}\right)^{-1}}^{\text{ Individual probability for } t>1}\right]\times \left[\prod_{i}% \overbrace{{\mathbb{P}_{\mathcal{Y}}\left(\mathbf{y_{{i,1}}};\boldsymbol{\theta}\right) }}^\text{Baseline prevalence} \right] \]

Implementation

The `defm` R package

defm: Discrete Exponential-Family Models.
Part of barry (your C++ motif accountant).
Like in Vega Yon, Slaughter, and de la Haye (2021), uses MLE to fit the model.
barry was optimized to fit pooled models (sna and even genetics!).
Used in models featuring up to 30 entries (e.g., 15 outcomes in 2 time points) in the adjacency matrix (about 1,000,000,000 combinations).

Example

The following example uses data from the SNS study (Valente et al. 2013)

# Loading the package and the data
library(defm)
data(valentesnsList)

# Reading the data
mymodel <- with(
  valentesnsList,
  new_defm(id = id, Y = Y, X = X, order = 1)
) |>
  term_defm_logit_intercept(idx = 'exposure_drink', coords = 0) |>
  term_defm_logit_intercept(idx = 'exposure_smoke', coords = 1) |>
  term_defm_logit_intercept(idx = 'exposure_mj', coords = 2) +
  "{y0}" + "{y1}" + "{y2}" +
  "{y1, 0y2} > {y1, y2}"

Num. of Arrays       : 1161
Support size         : 650
Support size range   : [8, 8]
Arrays in powerset   : 5200
Transform. Fun.      : no
Model terms ( 7)    :
 - Logit intercept alcohol x exposure_drink
 - Logit intercept tobacco x exposure_smoke
 - Logit intercept mj x exposure_mj
 - Logit intercept alcohol
 - Logit intercept tobacco
 - Logit intercept mj
 - Motif {tobacco⁺, mj⁻}⇨{tobacco⁺, mj⁺}
Model rules (1)    :
 - Markov model of order 1
Model Y variables (3):
  0) alcohol
  1) tobacco
  2) mj

Example (cont.)

The Logit intercepts are specific to individual outcomes.
Without motifs, the model reduces to a set of independent logistic regressions.
In this example, we are using longitudinal data, so we can use the transition motifs.

Statistical models
	Model 1
Logit intercept alcohol x exposure_drink	0.98 (0.17)^***
Logit intercept tobacco x exposure_smoke	1.78 (0.31)^***
Logit intercept mj x exposure_mj	1.89 (0.27)^***
Logit intercept alcohol	-0.61 (0.08)^***
Logit intercept tobacco	-2.50 (0.12)^***
Logit intercept mj	-2.13 (0.11)^***
Motif {tobacco⁺, mj⁻}⇨{tobacco⁺, mj⁺}	3.74 (0.35)^***
AIC	3258.19
BIC	3293.59
N	1161
^*p < 0.001; ^p < 0.01; ^*p < 0.05

Health conditions in Military Servicemembers

Caution

The original dataset has been modified by randomly swapping outcome names to protect the privacy of the participants. No conclusions should be drawn from these analyses.

Data

A subset of the study (553 participants), which includes reports on a large number of health conditions from a large sample US servicemembers.
For this example, we selected five health conditions: dizziness, headache, sleep problems, anxiety, and depression:
This is an ideal scenario, as it is unlikely servicemembers know each other (independence between individuals).

Reported condition	0	1
Dizziness	77	1138
Headache	235	980
Sleep problems	69	1146
Anxiety	312	903
Depression	115	1100

Caution

The original dataset has been modified by randomly swapping outcome names to protect the privacy of the participants. No conclusions should be drawn from these analyses.

Model fitting

A combination of functions and special syntax

Cross-sectional model

# Logit intercepts
model |>
  term_defm_logit_intercept() |>
  term_defm_logit_intercept(idx = "hispanic") |>
  term_defm_logit_intercept(idx = "site_cpen", coords = (0))

# Co-occurrence motifs
model +
  "{y0, y1}" +
  "{y0, y2}" +
  "{y0, y4}" +
  "{y2, y3}" +
  "{y3, y4}"

Panel model

# Logit intercepts
model |>
  term_defm_logit_intercept() |>
  term_defm_logit_intercept(idx = "age_yrs", coords = c(4)) |>  
  term_defm_logit_intercept(idx = "hispanic", coords = c(1))

# Co-occurrence motifs
model +
  "{y0, y1}" +
  "{y1, y2}" +
  "{y1, y3}" +
  "{y2, y4}" +
  "{y3, y4}"

# Transition motifs
model +
  "{y0, 0y3} > {y0, y3}" +
  "{y1, 0y3} > {y1, y3}" +
  "{0y3, y4} > {y3, y4}"

Preliminary results¹

                                                     Cross sectional     Panel             
-------------------------------------------------------------------------------------------
Logit terms                                                                                
    depression x age_yrs                                0.45 (0.14) **      0.44 (0.20) *  
    headache x hispanic                                 0.58 (0.23) *       0.74 (0.33) *  
Co-ocurrence                                                                               
    {dizziness⁺, headache⁺}                             2.47 (0.29) ***     2.76 (0.42) ***
    {headache⁺, sleep_prob⁺}                            1.11 (0.27) ***     1.22 (0.36) ***
    {headache⁺, anxiety⁺}                               0.79 (0.16) ***     1.12 (0.22) ***
    {sleep_prob⁺, depression⁺}                          2.08 (0.29) ***     1.99 (0.39) ***
    {anxiety⁺, depression⁺}                             2.36 (0.25) ***     2.67 (0.34) ***
Transition (rare in this dataset)           
    {dizziness⁺, anxiety⁻}⇨{dizziness⁺, anxiety⁺}                          -3.26 (0.40) ***
    {headache⁺, anxiety⁻}⇨{headache⁺, anxiety⁺}                            -0.03 (0.45)    
    {anxiety⁻, depression⁺}⇨{anxiety⁺, depression⁺}                        -2.01 (0.39) ***
-------------------------------------------------------------------------------------------
AIC                                                  3655.00             1643.41           
BIC                                                  3714.86             1708.93           
N                                                    1084                 583              
===========================================================================================
*** p < 0.001; ** p < 0.01; * p < 0.05

“Enrollment age” and “hispanic” predict depression and headache, respectively.
There is evicende of co-occurrence between some outcomes.
Anxiety following dizziness is less likely.
No evidence of headache leading to anxiety.
Anxiety following previous depression is less likely.

Discussion

Today

We use bipartite ERGMs to model multiple health outcomes jointly.
We assume idependence between individuals (so no suitable for data known to be correlated, like health behaviors!) (multilevel/pooled model).
The model is available in the defm R package https://github.com/UofUEpiBio/defm.
We demonstrated the model using data from the SNS study (Valente et al. 2013) and a (jittered) dataset of US servicemembers’ health conditions.

Next steps

Investiage the behavior using simulation studies.
(Re-)release the package on CRAN.
Relax the independence assumption (go for a proper ERGM!), i.e., like in ALAAMs.

Thanks!

Expanding the ERGM Framework

Modeling Interrelated Health Outcomes with Jointly-Distributed Binary Data

George G. Vega Yon, Thomas W. Valente, Jacob Kean, Mary Jo Pugh

george.vegayon@utah.edu

https://ggv.cl

@gvegayon.bsky.social

@gvegayon

Appendix

References

Almquist, Zack W., and Carter T. Butts. 2014. “Logistic Network Regression for Scalable Analysis of Networks with Joint Edge/Vertex Dynamics.” Sociological Methodology 44 (1): 273–321. https://doi.org/10.1177/0081175013520159.

Butts, Carter T. 2007. “9. Models for Generalized Location Systems.” Sociological Methodology 37 (1): 283–348. https://doi.org/10.1111/j.1467-9531.2006.00187.x.

Fellows, Ian E. 2012. “Exponential Family Random Network Models.” PhD thesis. https://login.ezproxy.lib.utah.edu/login?url=https://www.proquest.com/dissertations-theses/exponential-family-random-network-models/docview/1221548720/se-2.

Fellows, Ian, and Mark S. Handcock. 2012. “Exponential-Family Random Network Models.” August 1, 2012. https://doi.org/10.48550/arXiv.1208.0121.

Frank, O, and David Strauss. 1986. “Markov graphs.” Journal of the American Statistical Association 81 (395): 832–42. https://doi.org/10.2307/2289017.

Holland, Paul W., and Samuel Leinhardt. 1981. “An exponential family of probability distributions for directed graphs.” Journal of the American Statistical Association 76 (373): 33–50. https://doi.org/10.2307/2287037.

Parker, Andrew, Francesca Pallotti, and Alessandro Lomi. 2022. “New Network Models for the Analysis of Social Contagion in Organizations: An Introduction to Autologistic Actor Attribute Models.” Organizational Research Methods 25 (3): 513–40. https://doi.org/10.1177/10944281211005167.

Robins, Garry, Philippa Pattison, and Peter Elliott. 2001. “Network Models for Social Influence Processes.” Psychometrika 66 (2): 161–89. https://doi.org/10.1007/BF02294834.

Robins, Garry, Pip Pattison, Yuval Kalish, and Dean Lusher. 2007. “An introduction to exponential random graph (p*) models for social networks.” Social Networks 29 (2): 173–91. https://doi.org/10.1016/j.socnet.2006.08.002.

Snijders, Tom A B, Philippa E Pattison, Garry L Robins, and Mark S Handcock. 2006. “New specifications for exponential random graph models.” Sociological Methodology 36 (1): 99–153. https://doi.org/10.1111/j.1467-9531.2006.00176.x.

Valente, Thomas W., Kayo Fujimoto, Jennifer B. Unger, Daniel W. Soto, and Daniella Meeker. 2013. “Variations in Network Boundary and Type: A Study of Adolescent Peer Influences.” Social Networks 35 (July): 309–16. https://doi.org/10.1016/j.socnet.2013.02.008.

Vega Yon, George G., Andrew Slaughter, and Kayla de la Haye. 2021. “Exponential Random Graph Models for Little Networks.” Social Networks 64 (January): 225–38. https://doi.org/10.1016/j.socnet.2020.07.005.

Wang, Zeyi, Ian E. Fellows, and Mark S. Handcock. 2023. “Understanding Networks with Exponential-Family Random Network Models.” Social Networks, August, S0378873323000497. https://doi.org/10.1016/j.socnet.2023.07.003.

Wasserman, Stanley, and Philippa Pattison. 1996. “Logit models and logistic regressions for social networks: I. An introduction to Markov graphs and p*.” Psychometrika 61 (3): 401–25. https://doi.org/10.1007/BF02294547.

The core idea

\[ \left.\begin{array}{rl} L(\boldsymbol{\theta}_1) = & \mbox{Logit}^{-1}\left(\boldsymbol{\theta}_1 s\left(y_1\right)\right) \\ L(\boldsymbol{\theta}_2) = & \mbox{Logit}^{-1}\left(\boldsymbol{\theta}_2 s\left(y_2\right)\right) \\ & \dots \\ L(\boldsymbol{\theta}_K) = & \mbox{Logit}^{-1}\left(\boldsymbol{\theta}_K s\left(y_K\right)\right) \end{array}\right\}\mapsto % \begin{array}{rl} L(\boldsymbol{\theta}_1, \boldsymbol{\theta}_2, \dots, \boldsymbol{\theta}_k, \boldsymbol{\theta}_M) & = \\ % & \hspace{-3cm}\mbox{exp}\left\{\sum_k \boldsymbol{\theta}_k s\left(y_k\right) + \boldsymbol{\theta}_J s\left(y_1,\dots, y_K\right)\right\}\times\kappa\left(\boldsymbol{\theta}\right)^{-1}, \\ \\ \mbox{where }\boldsymbol{\theta}& = \left[\boldsymbol{\theta}_1, \dots, \boldsymbol{\theta}_K, \boldsymbol{\theta}_M\right] \end{array} \]

The DEFMs are very close to the logistic regression.
The only difference is the inclusion of terms that involve two or more outcomes, e.g., an interaction effect.
The right-hand side of the diagram shows how a set of independent Logistic models can be combined and extended by incorporating \(s(\cdot)\).
Moreover, if none of the sufficient statistics features more than one outcome, then the model is reduced to a logistic regression.

Expanding the ERGM Framework

Motivation

Types of questions

Background

Exponential Random Graph Models [ERGMs]

ERGMs (cont.)

Data structure

Data structure (bis)

Implementation

The defm R package

Example

Example (cont.)

Health conditions in Military Servicemembers

Data

Model fitting

Preliminary results1

Discussion

Thanks!

Expanding the ERGM Framework

Modeling Interrelated Health Outcomes with Jointly-Distributed Binary Data

Appendix

References

The core idea

The `defm` R package

Preliminary results¹