Random Number Generation and Statistical Functions
Overview¶
Epiworld provides a comprehensive set of random number generation utilities and statistical functions that are essential for stochastic modeling. These utilities are used to simulate probabilistic events, generate proposals for Markov Chain Monte Carlo (MCMC) methods, and compute likelihoods or kernel functions. The supported distributions include uniform, normal, gamma, and others, which are implemented using the std::mt19937 random engine and standard C++ distribution classes.
Distributions¶
Epiworld provides a set of statistical functions and random distributions that are essential for modeling stochastic processes in epidemiological simulations. These distributions are used to model various probabilistic events, such as the time between infections, the number of contacts per agent, or the likelihood of recovery. The library includes implementations for common distributions such as Poisson, uniform, and others, which are either directly implemented or can be extended using the random number generation utilities.
For preforming inference in scenarios where the likelihood function is intractable or unavailable, refer to the documentation for Likelihood-Free Markov Chain Monte Carlo (LFMCMC).
Poisson Distribution¶
The Poisson distribution is used to model the number of events occurring within a fixed interval of time or space, given a constant average rate of occurrence. In Epiworld, the dpois function computes the Poisson probability for a given number of events k and rate parameter lambda.
#include <iostream>
#include "epiworld/math/distributions.hpp"
int main() {
int k = 5; /* Number of events. */
double lambda = 3.0; /* Average rate of occurrence. */
double probability = dpois(k, lambda);
std::cout << "Poisson probability: " << probability << "\n";
return 0;
}
The dpois function also supports returning the logarithm of the probability by setting the as_log parameter to true. This is useful for numerical stability in computations involving very small probabilities.
Generation Interval Probably¶
The dgenint function calculates the probability of a generation interval, which is the time between the infection of a source and the infection of a target. This function is particularly useful for modeling the dynamics of disease transmission.
#include <iostream>
#include "epiworld/math/distributions.hpp"
int main() {
int g = 3; /* Generation interval. */
double S = 1000; /* Population size. */
double p_c = 0.01; /* Probability of contact. */
double p_i = 0.05; /* Probability of infection. */
double p_r = 0.02; /* Probability of recovery. */
double p_0_approx = -1.0; /* Approximation of the probability of not being infected. */
double normalizing = -1.0; /* Normalizing constant. */
double probability = dgenint(g, S, p_c, p_i, p_r, p_0_approx, normalizing);
std::cout << "Generation interval probability: " << probability << "\n";
return 0;
}
The dgenint function accounts for various factors, including the population size, contact probability, infection probability, and recovery probability. It also supports normalization to ensure that the probabilities sum to one.
Mean of the Generation Interval¶
The gen_int_mean function computes the mean of the generation interval, which is a key metric for understanding the dynamics of disease spread.
#include <iostream>
#include "epiworld/math/distributions.hpp"
int main() {
double S = 1000; // Population size
double p_c = 0.01; // Probability of contact
double p_i = 0.05; // Probability of infection
double p_r = 0.02; // Probability of recovery
double mean = gen_int_mean(S, p_c, p_i, p_r);
std::cout << "Mean generation interval: " << mean << std::endl;
return 0;
}
This function iterates over possible generation intervals, calculating their probabilities using dgenint and summing the weighted intervals to compute the mean.
Likelihood-Free Markov Chain Monte Carlo (LFMCMC)¶
The LFMCMC module in Epiworld is meant for performing inference in scenarios where the likelihood function is intractable or unavailable. It relies on random number generation to propose new parameter values, simulate data, and evaluate kernel functions. The module supports various proposal and kernel functions, which can be customized to suit specific modeling needs.
Random Distributions¶
The LFMCMC module integrates several random distributions for generating proposals and simulating data:
Uniform Distribution¶
Used for generating random values within a specified range. Example: runif() generates a random value in $[0, 1)$, while runif(lb, ub) generates a value in $[lb, ub)$.
Normal Distribution¶
Used for generating proposals in MCMC. Example: rnorm() generates a standard normal value, while rnorm(mean, sd) generates a value with the specified mean and standard deviation.
Gamma Distribution¶
Used for modeling waiting times or other positive-valued processes. Example: rgamma(alpha, beta) generates a value from a gamma distribution with shape alpha and scale beta.
Seeding and Reproducibility¶
Like everything else in Epiworld (including the above), the LFMCMC module ensures reproducibility by allowing users to seed the random engine. This is achieved using the seed method, which initializes the std::mt19937 engine with a fixed seed value. By setting the same seed, users can ensure that the sequence of random numbers remains consistent across runs.
#include "epiworld/math/lfmcmc/lfmcmc-bones.hpp"
#include <iostream>
int main() {
LFMCMC<int> lfmcmc;
/* Seed the random engine. */
lfmcmc.seed(42);
/* Generate random values. */
std::cout << "Uniform random value: " << lfmcmc.runif() << "\n";
std::cout << "Normal random value: " << lfmcmc.rnorm(0, 1) << "\n";
std::cout << "Gamma random value: " << lfmcmc.rgamma(2, 2) << "\n";
return 0;
}
Proposal Functions¶
Proposal functions in LFMCMC define how new parameter values are generated during the MCMC process. The module includes several built-in proposal functions.
Normal Proposal¶
Generates proposals by adding a normally distributed random value to the current parameter.
std::vector<epiworld_double> new_params, old_params = {1.0, 2.0};
lfmcmc.set_proposal_fun(proposal_fun_normal<int>);
lfmcmc.run(new_params, old_params, &lfmcmc);
Uniform Proposal¶
Generates proposals within a specified range around the current parameter.
Reflective Normal Proposal¶
Ensures that proposals remain within specified bounds by reflecting values that exceed the bounds.
auto reflective_proposal = make_proposal_norm_reflective<int>(0.1, 0.0, 1.0);
lfmcmc.set_proposal_fun(reflective_proposal);
Kernel Functions¶
Kernel functions evaluate the similarity between simulated and observed data. They are used to compute the acceptance probability in LFMCMC. The module includes several kernel functions.
Uniform Kernel¶
Accepts proposals if the Euclidean distance between simulated and observed statistics is below a threshold.
Gaussian Kernel¶
Computes a Gaussian-weighted similarity score.
Example¶
The run method executes the LFMCMC algorithm, generating samples from the posterior distribution of parameters. Users must specify the initial parameter values, the number of samples, and the kernel threshold.
std::vector<epiworld_double> initial_params = {0.5, 0.5};
size_t n_samples = 1000;
epiworld_double epsilon = 0.1;
lfmcmc.run(initial_params, n_samples, epsilon, 42);
During the run, the module generates proposals, simulates data, evaluates kernel functions, and accepts or rejects proposals based on the Hastings ratio.