Using Semantic Data to Improve Cross{Lingual Linking of Article Clusters

Evgenia Belyaeva, Aljaz Kosmerlj, Andrej Muhic, Jan Rupnik, Flavio Fuart


This paper presents a system that uses semantic data to improve cross{lingual linking of news article clusters. Two approaches are compared. The rst based on two dierent Canonical Correlation Analysis (CCA) feature vector denitions: MAX{CCA and SUM{CCA, whereas the second one has been developed using a better-performed CCA approach in combination with Entity vectors. The aim of the comparison was to determine whether taking into account the semantic aspect of news increases performance and improves
linking. Evaluations of the aforementioned techniques on a news corpus, both against Google News and manual, revealed good performance of our system. The overall gain in precision and recall when using entity vectors was signicant.

Full Text: PDF
Type of Paper: Research Paper
Keywords: semantic data, natural language processing, cross{linguality, canonical correlation analysis
Show BibTex format: BibTeX