WebPIE: A Web-scale parallel inference engine using MapReduce

Jacopo Urbani, Spyros Kotoulas, Jason Massen, Frank van Harmelen, Henri Bal


The large amount of Semantic Web data and its fast growth pose a significant computational challenge in performing efficient and scalable reasoning. On a large scale, the resources of single machines are no longer sufficient and we are required to distribute the process to improve performance.  In this article, we propose a distributed technique to perform materialization under the RDFS and OWL ter Horst semantics using the MapReduce programming model. We will show that a straightforward implementation is not efficient and does not scale. Our technique addresses the challenge of distributed reasoning through a set of algorithms which, combined, significantly increase performance.  We have implemented WebPIE (Web-scale Inference Engine) and we demonstrate its performance on a cluster of up to 64 nodes. We have evaluated our system using very large real-world datasets (Bio2RDF, LLD, FactForge and the Billion Triple Challenge dataset) and the LUBM synthetic benchmark, scaling up to 100 billion triples. Results show that our implementation scales linearly and vastly outperforms current systems in terms of maximum data size and inference speed.

Full Text: PDF
Type of Paper: research paper
Keywords: Semantic Web; MapReduce; high performance; distributed computing; reasoning
Show BibTex format: BibTeX