Wednesday, October 2, 2013

New preprint: An Automatic Key Discovery Approach for Data Linking

A new preprint is available on the JWS preprint server.

Nathalie Pernelle, Fatiha Saïs and Danai Symeonidou, An Automatic Key Discovery Approach for Data Linking, Web Semantics: Science, Services and Agents on the World Wide Web, to appear.

Abstract: In the context of Linked Data, different kinds of semantic links can be established between data. However when data sources are huge, detecting such links manually is not feasible. One of the most important types of links, the identity link, expresses that different identifiers refer to the same real world entity. Some automatic data linking approaches use keys to infer identity links, nevertheless this kind of knowledge is rarely available. In this work we propose KD2R, an approach which allows the automatic discovery of composite keys in RDF data sources that may conform to different schemas. We only consider data sources for which the Unique Name Assumption is fulfilled. The obtained keys are correct with respect to the RDF data sources in which they are discovered. The proposed algorithm is scalable since it allows the key discovery without having to scan all the data. KD2R has been tested on real datasets of the international contest OAEI 2010 and on data sets available on the web of data, and has obtained promising results.