PUMA
Istituto di Scienza e Tecnologie dell'Informazione     
Campinas S., Ceccarelli D., Perry T., Delbru R., Tummarello G., Balog K. The Sindice-2011 dataset for entity-oriented search in the Web of data. In: EOS 2011 - 1st International Workshop on Entity-Oriented Search (Beijing, 28 July 2011). Proceedings, vol. 1 pp. 26 - 32. Balog K., de Vries A.P., Serdyukov P., Wen, J-R (eds.). TU Delft, 2011.
 
 
Abstract
(English)
The task of entity retrieval becomes increasingly prevalent as more and more (semi-) structured information about objects is available on the Web in the form of documents embedding metadata (RDF, RDFa, Microformats and others). However, research and development in that direction is dependent on (1) the availability of a representative corpus of entities that are found on the Web, and (2) the availability of an entity-oriented search infrastructure for experimenting new retrieval model. In this paper, we introduce the Sindice-2011 data collection which is derived from the data collected by the Sindice semantic search engine. The data collection is especially designed for supporting research in the domain of web entity retrieval. We describe how the corpus is organised, discuss statistics of the data collection, and introduce a search infrastructure to foster research and development.
URL: http://research.microsoft.com/en-us/um/beijing/events/eos2011/
Subject Entity search
Web of Data
Entity corpus
H.3.1 Content Analysis and Indexing
H.3.3 Information Search and Retrieval
H.3.4 Systems and Software


Icona documento 1) Download Document PDF


Icona documento Open access Icona documento Restricted Icona documento Private

 


Per ulteriori informazioni, contattare: Librarian http://puma.isti.cnr.it

Valid HTML 4.0 Transitional