PUMA
Istituto di Scienza e Tecnologie dell'Informazione     
Baeza-Yates R., Gionis A., Junqueira F., Murdock V., Plachouras V., Silvestri F. Design trade-offs for search engine caching. In: Acm Transactions on the Web, vol. 2 (4) article n. 20. Query log analysis: technology and ethics. Einat Amitay and Andrei Broder (eds.). ACM, 2008.
 
 
Abstract
(English)
In this article we study the trade-offs in designing efficient caching systems for Web search engines. We explore the impact of different approaches, such as static vs. dynamic caching, and caching query results vs. caching posting lists. Using a query log spanning a whole year, we explore the limitations of caching and we demonstrate that caching posting lists can achieve higher hit rates than caching query answers. We propose a new algorithm for static caching of posting lists, which outperforms previous methods. We also study the problem of finding the optimal way to split the static cache between answers and posting lists. Finally, we measure how the changes in the query log influence the effectiveness of static caching, given our observation that the distribution of the queries changes slowly over time. Our results and observations are applicable to different levels of the data-access hierarchy, for instance, for a memory/disk layer or a broker/remote server layer.
Subject Caching
Web search
Query logs
H.3.3 Information Storage and Retrieval. Information Search and Retrieval. Search process
H.3.4 Information Storage and Retrieval. Systems and Software. Distributed systems, performance evaluation (efficiency and effectiveness)


Icona documento 1) Download Document PDF


Icona documento Open access Icona documento Restricted Icona documento Private

 


Per ulteriori informazioni, contattare: Librarian http://puma.isti.cnr.it

Valid HTML 4.0 Transitional