Istituto di Scienza e Tecnologie dell'Informazione     
Ottaviano G., Tonellotto N., Venturini R. Optimal space-time tradeoffs for inverted indexes. In: WSDM'15 - Eighth ACM International Conference on Web Search and Data Mining (Shanghai, Popular Republic of China, 31 January - 6 February 2015). Proceedings, pp. 47 - 56. ACM, 2015.
Inverted indexes are usually represented by dividing posting lists into constant-sized blocks and representing them with an encoder for sequences of integers. Di erent encoders yield a di erent point in the space-time trade-o curve, with the fastest being several times larger than the most space-ecient. An important design decision for an index is thus the choice of the fastest encoding method such that the index ts in the available memory. However, a better usage of the space budget could be obtained by using faster encoders for frequently accessed blocks, and more space-ecient ones those that are rarely accessed. To perform this choice optimally, we introduce a linear time algorithm that, given a query distribution and a set of encoders, selects the best encoder for each index block to obtain the lowest expected query processing time respecting a given space constraint. To demonstrate the e ectiveness of this approach we perform an extensive experimental analysis, which shows that our algorithm produces indexes which are signi cantly faster than single-encoder indexes under several query processing strategies, while respecting the same space constraints.
URL: http://dl.acm.org/citation.cfm?id=2685297&CFID=748200543&CFTOKEN=15193674
DOI: 10.1145/2684822.2685297
Subject Compression
Knapsack Problems
Inverted Indexes
H.3.2 Information Storage
E.4 Data Compaction and Compression

Icona documento 1) Download Document PDF
Icona documento 2) Download Document PDF

Icona documento Open access Icona documento Restricted Icona documento Private


Per ulteriori informazioni, contattare: Librarian http://puma.isti.cnr.it

Valid HTML 4.0 Transitional