Istituto di Scienza e Tecnologie dell'Informazione     
Gog S., Venturini R. Fast and compact hamming distance index. In: SIGIR'16 - 39th International ACM SIGIR conference on Research and Development in Information Retrieval (Pisa, Italy, 17-21 July 2016). Proceedings, pp. 285 - 294. ACM, 2016.
Searching for similar objects in a collection is a core task of many applications in databases, pattern recognition, and information retrieval. As there exist similarity-preserving hash functions like SimHash, indexing these objects reduces to the solution of the Approximate Dictionary Queries problem. In this problem we have to index a collection of fixed-sized keys to efficiently retrieve all the keys which are at a Hamming distance at most k from a query key. In this paper we propose new solutions for the approximate dictionary queries problem. These solutions combine the use of succinct data structures with an efficient representation of the keys to significantly reduce the space usage of the state-of-the-art solutions without introducing any time penalty. Finally, by exploiting triangle inequality, we can also significantly speed up the query time of the existing solutions.
Subject Indexing
H.3.3 INFORMATION STORAGE AND RETRIEVAL. Information Search and Retrieval

Icona documento 1) Download Document PDF

Icona documento Open access Icona documento Restricted Icona documento Private


Per ulteriori informazioni, contattare: Librarian http://puma.isti.cnr.it

Valid HTML 4.0 Transitional