An Unsupervised Instance Matcher for Schema-free RDF Data

Mayank Kejriwal, Dani Miranker


This article presents an unsupervised system that performs instance matching between entities in schema-free Resource Description Framework (RDF) files. Rather than relying on domain expertise or manually labeled samples, the system automatically generates its own heuristic training set. The training sets are first used by the system to align the properties in the input graphs. The property alignment and training sets are used together to simultaneously learn two functions, one for the blocking step of instance matching and the other for the classification step. Finally, the learned functions are used to perform instance matching. The full system is
implemented as a sequence of components that can be iteratively executed to boost performance. Evaluations on a suite of ten test cases show individual components to be competitive with state-of-the-art baselines. The system as a whole is shown to compete eectively with adaptive supervised approaches.

Full Text: PDF
Type of Paper: Research Paper
Keywords: Instance Matching, Unsupervised System, Schema-free data, Linked Data, Automatic Training Set Generation, Feature Selection, Property Alignment, Modularity
Show BibTex format: BibTeX