Skip to content

Data Collection and Analysis

Overview

The DataBase class is responsible for collecting, organizing, and exporting statistical data generated during simulations. This data encompasses a wide range of information, including the state of agents, the history of viruses and tools, and the transmission network. The DataBase class is tightly integrated with the Model class, which provides the context for the simulation, such as the agents, their states, and the interactions between them.

We track data at multiple levels of granularity. At the agent level, it records the state of each individual and their transitions over time. At the population level, it aggregates this information to provide a snapshot of the overall state distribution. Additionally, it maintains a history of virus and tool events, such as the introduction of new variants or the application of interventions. The transmission network is also recorded, capturing the source, target, and timing of each transmission event. This comprehensive data collection allows for detailed analysis of the simulation's behavior.

Recording During Simulations

We automatically record data at each step of the simulation. This includes the current state of all agents, the distribution of states for each virus and tool, and the overall state distribution. For example, the record method is called at the end of each simulation step to update the internal data structures with the latest information. This ensures that the data remains consistent with the simulation's progression.

In addition to automatic recording, the DataBase class provides methods for manually recording specific events. For instance, the record_virus method is used to register a new virus, while the record_tool method is used to register a new tool. These methods update the relevant data structures with information about the virus or tool, such as its name, sequence, and origin date. Similarly, the record_transmission method is used to record a transmission event, capturing the source, target, and timing of the event.

#include "epiworld/database-bones.hpp"
#include "epiworld/model-bones.hpp"

int main() {
    Model<int> model(100); // Create a model with 100 agents
    DataBase<int> db(model); // Initialize the database

    Virus<int> virus("ExampleVirus");
    db.record_virus(virus); // Record a new virus

    db.record_transmission(1, 2, 0, 5); // Record a transmission event

    db.record(); // Record the current state of the simulation
    return 0;
}

Exporting Data

The write_data method allows for exporting simulation results to external files. This method allows users to specify multiple output files, each corresponding to a specific type of data. For example, one file may contain information about viruses, while another contains the transmission network. The data is written in a tabular format, making it easy to import into statistical software or spreadsheet applications for further analysis.

db.write_data(
    "virus_info.csv",
    "virus_hist.csv",
    "tool_info.csv",
    "tool_hist.csv",
    "total_hist.csv",
    "transmission.csv",
    "transition.csv",
    "reproductive_number.csv",
    "generation_time.csv",
    "active_cases.csv",
    "outbreak_size.csv"
);

Each file contains a specific subset of the simulation data: - virus_info.csv: Information about viruses, including their names, sequences, and origin dates. - virus_hist.csv: A history of virus-related events, such as the introduction of new variants. - tool_info.csv: Information about tools, including their names and sequences. - tool_hist.csv: A history of tool-related events. - total_hist.csv: The overall state distribution over time. - transmission.csv: The transmission network, including the source, target, and timing of each event. - transition.csv: The transition matrix, showing the counts or probabilities of transitions between states. - reproductive_number.csv: The reproductive number for each case. - generation_time.csv: The generation time, which is the time between the infection of the source and the infection of the target.

Basic Analysis

The DataBase class includes methods for performing basic analysis on the simulation data. These methods allow users to calculate metrics such as transition probabilities, reproductive numbers, and generation times. For example, the get_transition_probability method calculates the probability of transitioning from one state to another, while the get_reproductive_number method computes the reproductive number for each case.

std::vector<epiworld_double> transition_probs = db.get_transition_probability(true, true);
MapVec_type<int, int> reproductive_numbers = db.get_reproductive_number();

std::vector<int> agent_id, virus_id, time, gentime;
db.get_generation_time(agent_id, virus_id, time, gentime);

As an example of usefulness, transition probabilities can help identify the likelihood of agents moving between different states, while reproductive numbers can indicate the potential for the infection to spread. Generation times can be used to study the timing of transmission events and their impact on the overall dynamics.

It's worth noting that analysis of any great depth should likely not be done in C++. The model data should either be exported (see above) and imported into a tool of your choosing (e.g. R, Excel, or Python), or rewritten using either the epiworldR or epiworldPy bindings.

Transmission Network Analysis

The get_transmissions method retrieves data about the transmission network, including the source, target, and timing of each event. This data can be used to construct transmission networks and analyze the spread of the infection. For example, users can visualize the network to identify clusters of infections or calculate network metrics such as centrality and clustering coefficients.

std::vector<int> date, source, target, virus, source_exposure_date;
db.get_transmissions(date, source, target, virus, source_exposure_date);

for (size_t i = 0; i < date.size(); ++i) {
    std::cout << "Transmission on day " << date[i]
              << " from agent " << source[i]
              << " to agent " << target[i]
              << " with virus " << virus[i] << std::endl;
}

This example retrieves the transmission network data and prints it to the console. Each transmission event is described by its date, source, target, and the virus involved.