This article deals with a challenge of simulation studies. In particular it deals with traceability. We define what the term traceability means and what is not meant by it. Additionally we illustrate a concept how to achieve traceability with established methods from software engineering.
What is traceability
The term traceability describes the fact that all steps necessary to create results with simulation studies are linked with each other. Thereby traceability is the foundation of reproducible simulation results. A basic workflow when producing results out of simulation study consists of the following steps.
- Development of the simulation model
- Configuring the simulation model
- Executing the simulation study
- Collecting the simulation results
- Post-processing the simulation results, e.g. into plots
With traceability it is possible to link all steps. E.g. it is possible to link a plot to the execution environment used to run the study. The execution environment is linked to the configuration of the simulation model and the configuration is finally linked to the source code of the model itself.
It is important to mention that traceability does not guarantee that the simulation results are correct. This brings us to the next section, which discusses what traceability is not.
What traceability is not
Traceability may not be mixed up with validity or credibility of simulations. While there are methods, partly also known in software engineering, to ensure the correctness of simulations, it is in general much more difficult to guarantee the credibility of simulations.
This topic deserves an own article and will be covered in a later post.
How to achieve traceability
Traceability is already widely adopted in software engineering, even if it is not known under this term. The remainder of this article will discuss these concepts and illustrate how these can be applied to simulation studies.
Version control
A version control system allows to keep a complete history of changes applied to a set of files. This allows to track all changes as well as to restore older versions. Each version is identified by an unique identifier. While version control systems can manage all kind of files, they work especially well for text based files because there the changes can directly be shown and read by the users of the version control. Therefore, all source code used in simulation studies, e.g. the source code of the model, configuration files, scripts to execute the simulation or collect the results, and also the results themselves, should be manged by version control systems.
Versioning
If the simulation depends on external libraries or tools, the exact version of the tools should be included in the configuration of the study. The configuration itself should be managed by a version control system. Thereby, the different components are linked and traceability is achieved.
Meta data
Finally, meta data allows to embed additional information in other data, e.g. plots of simulation results. A plot should contain the exact version of the simulation environment as meta data. Thereby, it is traceable how this plot has been created and thereby also how the simulation model has been executed and configured and what the exact version of the simulation model was.