Modelling Provenance of DBpedia Resources Using Wikipedia Contributions

Fabrizio Orlandi, Alexandre Passant


DBpedia is one of the largest datasets in the Linked Open Data cloud. Its centrality and its cross-domain nature makes it one of the most important and most referred to knowledge bases on the Web of Data, generally used as a reference for data interlinking. Yet, in spite of its authoritative aspect, there is no work so far tackling the provenance aspect of DBpedia statements. By being extracted from Wikipedia, an open and collaborative encyclopedia, delivering provenance information about it would help to ensure trustworthiness of its data, a major need for people using DBpedia data for building applications. To overcome this problem, we propose an approach for modelling and managing provenance on DBpedia using Wikipedia edits, and making this information available on the Web of Data.In this paper, we describe the framework that we implemented to do so, consisting in (1) a lightweight modelling solution to semantically represent provenance of both DBpedia resources and Wikipedia content, along with mappings to popular ontologies such as the W7 — what, when, where, how, who, which, and why — and OPM — Open Provenance Model — models, (2) an information extraction process and a provenance-computation system combining Wikipedia articles' history with DBpedia information, (3) a set of scripts to make provenance information about DBpedia statements directly available when browsing this source, as well as being publicly exposed in RDF for letting software agents consume it.

Full Text: PDF
Type of Paper: Research Paper
Keywords: Provenance; Linked Data; DBpedia; Wikipedia
Show BibTex format: BibTeX