Modelling Provenance of DBpedia Resources Using Wikipedia Contributions
Abstract
DBpedia is one of the largest datasets in the Linked Open Data cloud. Its centrality and its
cross-domain nature makes it one of the most important and most referred to knowledge bases on the
Web of Data, generally used as a reference for data interlinking. Yet, in spite of its authoritative aspect,
there is no work so far tackling the provenance aspect of DBpedia statements. By being extracted from
Wikipedia, an open and collaborative encyclopedia, delivering provenance information about it would
help to ensure trustworthiness of its data, a major need for people using DBpedia data for building
applications. To overcome this problem, we propose an approach for modelling and managing
provenance on DBpedia using Wikipedia edits, and making this information available on the Web of
Data.
In this paper, we describe the framework that we implemented to do so, consisting in (1) a lightweight
modelling solution to semantically represent provenance of both DBpedia resources and Wikipedia
content, along with mappings to popular ontologies such as the W7 — what, when, where, how, who,
which, and why — and OPM — Open Provenance Model — models, (2) an information extraction
process and a provenance-computation system combining Wikipedia articles' history with DBpedia
information, (3) a set of scripts to make provenance information about DBpedia statements directly
available when browsing this source, as well as being publicly exposed in RDF for letting software
agents consume it.
cross-domain nature makes it one of the most important and most referred to knowledge bases on the
Web of Data, generally used as a reference for data interlinking. Yet, in spite of its authoritative aspect,
there is no work so far tackling the provenance aspect of DBpedia statements. By being extracted from
Wikipedia, an open and collaborative encyclopedia, delivering provenance information about it would
help to ensure trustworthiness of its data, a major need for people using DBpedia data for building
applications. To overcome this problem, we propose an approach for modelling and managing
provenance on DBpedia using Wikipedia edits, and making this information available on the Web of
Data.
In this paper, we describe the framework that we implemented to do so, consisting in (1) a lightweight
modelling solution to semantically represent provenance of both DBpedia resources and Wikipedia
content, along with mappings to popular ontologies such as the W7 — what, when, where, how, who,
which, and why — and OPM — Open Provenance Model — models, (2) an information extraction
process and a provenance-computation system combining Wikipedia articles' history with DBpedia
information, (3) a set of scripts to make provenance information about DBpedia statements directly
available when browsing this source, as well as being publicly exposed in RDF for letting software
agents consume it.
