Istituto di Scienza e Tecnologie dell'Informazione     
Tonellotto N., Silvestri F., Perego R. Representing document lengths with identifiers. In: ECIR 2011 - Advances in Information Retrieval. 33rd European Conference on IR Research (Dublin, Ireland, 18-21 April 2011). Proceedings, pp. 665 - 669. Paul Clough, Colum Foley, Cathal Gurrin, Gareth J.F. Jones, Wessel Kraaij, Hyowon Lee, Vanessa Mudoch (eds.). (Lecture Notes in Computer Science, vol. 6611). Springer, 2011.
The length of each indexed document is needed by most common text retrieval scoring functions to rank it with respect to the current query. For efficiency purposes information retrieval systems maintain this information in the main memory. This paper proposes a novel strategy to encode the length of each document directly in the document identifier, thus reducing main memory demand. The technique is based on a simple document identifier assignment method and a function allowing the emph{approximate} length of each indexed document to be computed analytically. The paper discusses the implication of the adoption of the proposed technique, and the encouraging results of the experiments conducted with the 2009 TREC Web Track dataset.
URL: http://www.springerlink.com/content/f308811285546ll8/
DOI: 10.1007/978-3-642-20161-5_66
Subject Information Retrieval
H.3.3 Information Search and Retrieval

Icona documento 1) Download Document PDF
Icona documento 2) Download Document PDF

Icona documento Open access Icona documento Restricted Icona documento Private


Per ulteriori informazioni, contattare: Librarian http://puma.isti.cnr.it

Valid HTML 4.0 Transitional