Marvin: distributed reasoning over large-scale Semantic Web data

Eyal Oren, Spyros Kotoulas, George Anadiotis, Ronny Siebes, Annette ten Teije, Frank van Harmelen

Abstract


Many Semantic Web problems are difficult to solve through common divide-and-conquer strategies, since they are hard to partition. We present Marvin a parallel and distributed platform for processing large amounts of RDF data, on a network of loosely-coupled peers. We present our divide-conquer-swap strategy and show that this model converges towards completeness.

Within this strategy, we address the problem of making distributed reasoning scalable and load-balanced. We present SpeedDate, a routing strategy that combines data clustering with random exchanges. The random exchanges ensure load balancing, while the data clustering attempts to maximise efficiency. SpeedDate is compared against random and deterministic (DHT-like) approaches, on performance and load-balancing.

We simulate parameters such as system size, data distribution, churn rate, and network topology. The results indicate that SpeedDate is near-optimally balanced, performs in the same order of magnitude as a DHT-like approach, and has an average throughput per node that scales with √i for i items in the system. We evaluate our overall Marvin system for performance, scalability, load balancing and efficiency.


Full Text: PDF
Type of Paper: Research Paper
Keywords: Distributed; Reasoning; Scalability; Load-balancing;
Show BibTex format: BibTeX