SINA: Semantic Interpretation of User Queries for Question Answering on Interlinked Data

Saeedeh Shekarpour, Edgard Marx, Axel-Cyrille Ngonga Ngomo, Sören Auer

Abstract


The architectural choices underlying Linked Data have led to a compendium of data sources which contain both duplicated and fragmented information on a large number of domains. One way to enable non-experts users to access this data compendium is to provide keyword search frameworks that can capitalize on the inherent characteristics of Linked Data. Developing such systems is challenging for three main reasons. First, resources across dierent datasets or even within the same dataset can be homonyms. Second, dierent datasets employ heterogeneous schemas and each one may only contain a part of the answer for a certain user query. Finally, constructing a federated formal query from keywords across dierent datasets requires exploiting links between the dierent datasets on both the schema and instance levels.
We present Sina, a scalable keyword search system that can answer user queries by transforming user-supplied keywords or natural-languages queries into conjunctive SPARQL queries over a set of interlinked data sources. Sina uses a hidden Markov model to determine the most suitable resources for a user-supplied query from dierent datasets. Moreover, our
framework is able to construct federated queries by using the  disambiguated resources and leveraging the link structure underlying the datasets to query. We evaluate Sina over three dierent datasets. We can answer 25 queries from the QALD-1 correctly. Moreover, we perform as well as the best question answering system from the QALD-3 competition
by answering 32 questions correctly while also being able to answer queries on distributed sources. We study the runtime of SINA in its mono-core and parallel implementations and draw preliminary conclusions on the scalability of keyword search on Linked Data.

Full Text: PDF
Type of Paper: Research Paper
Keywords: Keyword search, Question answering, Hidden Markov model, SPARQL, RDF, Linked Data, Disambiguation
Show BibTex format: BibTeX