Robust and Scalable Linked Data Reasoning Incorporating Provenance and Trust Annotations

Piero A Bonatti, Aiden Hogan, Axel Polleres, Luigi Sauro


In this paper, we leverage annotated logic programs for tracking indicators of provenance
and trust during reasoning, specifically focussing on the use-case of applying a scalable subset of OWL
2 RL/RDF rules over static corpora of arbitrary Linked Data (Web data). Our annotations encode three
facets of information: (i) blacklist: a (possibly manually generated) boolean annotation which indicates
that the referent data are known to be harmful and should be ignored during reasoning; (ii) ranking: a
numeric value derived by a PageRank-inspired technique---adapted for Linked Data---which
determines the centrality of certain data artefacts (such as RDF documents and statements); (iii)
authority: a boolean value which uses Linked Data principles to conservatively determine whether or
not some terminological information can be trusted.
We formalise a logical framework which annotates inferences with the strength of derivation along
these dimensions of trust and provenance; we formally demonstrate some desirable properties of the
deployment of annotated logic programming in our setting, which guarantees (i) a unique minimal
model (least fixpoint); (ii) monotonicity; (iii) finitariness; and (iv) finally decidability. In so doing, we
also give some formal results which reveal strategies for scalable and efficient implementation of
various reasoning tasks one might consider.
Thereafter, we discuss scalable and distributed implementation strategies for applying our ranking and
reasoning methods over a cluster of commodity hardware; throughout, we provide evaluation of our
methods over 1 billion Linked Data quadruples crawled from approximately 4 million individual Web
documents, empirically demonstrating the scalability of our approach, and how our annotation values
help ensure a more robust form of reasoning. We finally sketch, discuss and evaluate a use-case for a
simple repair of inconsistencies detectable within OWL 2 RL/RDF constraint rules using ranking
annotations to detect and defeat the "marginal view", and in so doing, infer an empirical "consistency
threshold" for the Web of Data in our setting.

Full Text: PDF
Type of Paper: Research Paper
Keywords: annotated programs; linked data; web reasoning; scalable reasoning; distributed reasoning; authoritative reasoning; owl 2 rl; provenance; pagerank; inconsistency; repair
Show BibTex format: BibTeX