Performance Optimization

Performance profiling in epiworld focuses on identifying computational bottlenecks in simulation loops and agent update routines. Since the library is header-only and heavily templated, traditional profiling tools that depend on symbol-level separation are less effective. Instead, profiling is typically done through compiler-supported instrumentation (such as -pg with gprof) or runtime sampling tools (such as perf, valgrind, or Intel VTune) when running compiled executables that use the library. For lightweight insight, users can also instrument specific regions in the code with wall-clock timers from the C++ standard library (e.g., std::chrono) to measure step-level performance or to compare different update strategies. Continuous integration pipelines include code coverage reporting through Codecov.

Memory Usage Optimization¶

Memory management in epiworld emphasizes minimizing allocation overhead and data duplication, which is critical in large-scale agent-based models. The library structures its data around contiguous containers such as std::vector and raw c-arrays, which allow for efficient iteration and relatively predictable cache performance. Dynamic memory use is reduced by preallocating storage for agents, states, and viruses before the simulation begins (or in as broad a scope as possible), based on known population sizes and model configuration.

Because the library is template-based and header-only, most logic is inlined, and data structures are specialized/monomorphized at compile time. This eliminates unnecessary abstraction layers and reduces pointer chasing, improving both speed and memory locality. Models reuse internal buffers and avoid repeated construction of large containers, keeping per-step overhead small even for large populations.

Users running large experiments can further optimize memory by minimizing per-agent state complexity and reusing model objects between runs rather than constructing new ones. Since epiworld does not depend on dynamic memory allocators beyond the standard library, this approach helps maintain predictable memory footprints and avoids fragmentation over repeated simulations.

Parallel Execution Strategies¶

We supports parallel execution through OpenMP, which is used to distribute workloads across agents and simulation steps when safe. The library's design makes it straightforward to parallelize agent updates and contagion calculations, since many of these operations are independent. OpenMP pragmas are applied at critical loops, allowing the same code to execute on multiple cores with minimal user configuration.

Users can control the number of threads and scheduling behavior at compile time (via flags such as -fopenmp) or runtime (via environment variables like OMP_NUM_THREADS). Because the core simulation loop avoids global locks and shared-state dependencies, scaling is nearly linear on multi-core systems for sufficiently large populations. Generally, however, users may treat OpenMP as an implementation detail, and not have to worry about it in calling client code.

Due to the structure of the library internally, and where parallel execution can sanely be applied, correctness is relatively simple to attain. Since parallelism techniques, leveraging that of OpenMP, are used primarily to speed up simple accumulatory arithmetic, we do not have to worry about race conditions to such a great degree as one might expect. The combination of template-level inlining and OpenMP parallelism allows epiworld to reach very high throughput—on the order of hundreds of millions of agent-day operations per second on typical hardware.

Benchmarking Methodologies¶

As of current, epiworld does not include a dedicated benchmarking suite. The examples included in the repository—such as helloworld.cpp and readme.cpp—serve as informal benchmarks, reporting elapsed time and throughput at the end of each simulation. These provide a consistent way to monitor performance across versions and environments. While there is a benchmarks/ directory, this is as of yet unpopulated.

Until a formal benchmarking system is implemented, users can measure performance externally using tools such as /usr/bin/time, perf, or custom C++ timing utilities based on std::chrono. Running example models with controlled parameters and fixed random seeds allows fair comparisons between compiler flags, thread counts, and machine configurations.

Future benchmarking work will likely include a standardized set of models run under controlled conditions, with timing, memory use, and scaling data automatically collected. This would make it easier to track performance regressions and validate the efficiency of OpenMP parallel execution across releases.