Istituto di Scienza e Tecnologie dell'Informazione     
Ottaviano G., Venturini R. Partitioned Elias-Fano indexes. In: SIGIR 2014 - 37th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Gold Coast, QLD, Australia, 6-11 July 2014). Proceedings, pp. 273 - 282. ACM, 2014.
The emph{Elias-Fano} representation of monotone sequences has been recently applied to the compression of inverted indexes, showing excellent query performance thanks to its efficient random access and search operations. While its space occupancy is competitive with some state-of-the-art methods such as $gamma$-$delta$-Golomb codes and PForDelta, it fails to exploit the local clustering that inverted lists usually exhibit, namely the presence of long subsequences of close identifiers. In this paper we describe a new representation based on partitioning the list into chunks and encoding both the chunks and their endpoints with Elias-Fano, hence forming a two-level data structure. This partitioning enables the encoding to better adapt to the local statistics of the chunk, thus exploiting clustering and improving compression. We present two partition strategies, respectively with fixed and variable-length chunks. For the latter case we introduce a linear-time optimization algorithm which identifies the minimum-space partition up to an arbitrarily small approximation factor. We show that our partitioned Elias-Fano indexes offer significantly better compression than plain Elias-Fano, while preserving their query time efficiency. Furthermore, compared with other state-of-the-art compressed encodings, our indexes exhibit the best compression ratio/query time trade-off.
URL: http://dl.acm.org/citation.cfm?doid=2600428.2609615
DOI: 10.1145/2600428.2609615
Subject Indexing

Icona documento 1) Download Document PDF
Icona documento 2) Download Document PDF

Icona documento Open access Icona documento Restricted Icona documento Private


Per ulteriori informazioni, contattare: Librarian http://puma.isti.cnr.it

Valid HTML 4.0 Transitional