Provenance-Based Reproducibility in the Semantic Web
Abstract
Reproducibility is a crucial property of data since it allows users to understand and verify
how data was derived, and therefore allows them to put their trust in such data. Reproducibility is
essential for science, because the reproducibility of experimental results is a tenet of the scientific
method, but reproducibility is also beneficial in many other fields, including automated decision
making, visualization, and automated data feeds. To achieve the vision of reproducibility, the
workflow-based community has strongly advocated the use of provenance as an underpinning
mechanism for reproducibility, since a rich representation of provenance allows steps to be
reproduced and all intermediary and final results checked and validated. Concurrently, multiple
ontology-based representations of provenance have been devised, to be able to describe past
computations, uniformly across a variety of technologies. However, such Semantic Web
representations of provenance do not have any formal link with execution. Even assuming a faithful
and non-malicious environment, how can we claim that an ontology-based representation of
provenance enables reproducibility, since it has not been given any execution semantics, and therefore
has no formal way of expressing the reproduction of computations? This is the problem that this paper
tackles by defining a denotational semantics for the Open Provenance Model, which is referred to as
the reproducibility semantics. This semantics is used to implement a reproducibility service,
leveraging multiple Semantic Web technologies, and offering a variety of reproducibility approaches,
found in the literature. A series of empirical experiments were
designed to exhibit the range of reproducibility capabilities of our approach;
in particular, we demonstrate the ability to reproduce
computations involving multiple technologies, as is commonly found on the Web.
how data was derived, and therefore allows them to put their trust in such data. Reproducibility is
essential for science, because the reproducibility of experimental results is a tenet of the scientific
method, but reproducibility is also beneficial in many other fields, including automated decision
making, visualization, and automated data feeds. To achieve the vision of reproducibility, the
workflow-based community has strongly advocated the use of provenance as an underpinning
mechanism for reproducibility, since a rich representation of provenance allows steps to be
reproduced and all intermediary and final results checked and validated. Concurrently, multiple
ontology-based representations of provenance have been devised, to be able to describe past
computations, uniformly across a variety of technologies. However, such Semantic Web
representations of provenance do not have any formal link with execution. Even assuming a faithful
and non-malicious environment, how can we claim that an ontology-based representation of
provenance enables reproducibility, since it has not been given any execution semantics, and therefore
has no formal way of expressing the reproduction of computations? This is the problem that this paper
tackles by defining a denotational semantics for the Open Provenance Model, which is referred to as
the reproducibility semantics. This semantics is used to implement a reproducibility service,
leveraging multiple Semantic Web technologies, and offering a variety of reproducibility approaches,
found in the literature. A series of empirical experiments were
designed to exhibit the range of reproducibility capabilities of our approach;
in particular, we demonstrate the ability to reproduce
computations involving multiple technologies, as is commonly found on the Web.
