PUMA
Istituto di Scienza e Tecnologie dell'Informazione     
Dohnal V., Gennaro C., Savino P., Zezula P. Separable Splits of Metric Data Sets. In: Sistemi Evoluti per Basi di Dati. Atti del IX Congresso Nazionale SEBD 2001 (Venezia, Italy, 27-29 June 2001), 45-62. Augusto Celentano, Letizia Tanca e paolo Tiberio (cur.), 2001.
 
 
Abstract
(English)
In order to speedup retrieval in large collections of data, index structures partition the data into subsets so that query requests can be evaluated without examining the entire collection. As the complexity of modern data types (such as image, video, or audio features) grows, the traditional partitioning techniques based on total ordering of data can not typically be applied. We consider the problem of partitioning data collections from generic metric spaces, where total ordering of objects does not exists, and where only distances between pairs of objects can be determined. We study the elementary type of partitioning that splits a given collection into two well-separated subsets, allowing some objects to be excluded from the partitioning process. Five implementation techniques of separable splits are proposed and proved for correctness. The rst two are simple extensions of the known ball partitioning and the generalized hyperplane approaches, the third is an advanced hyperplane partitioning. The additional two techniques are completely original and are based on the elliptic and pseudo-elliptic geometric strategies. E ectiveness of all techniques is evaluated in terms of their ability to equalize the separable set sizes, and to minimize the number of excluded objects. Proposed techniques are evaluated on three large data les.
Subject data partitioning
advanced hyperplane partitioning
H.3.3 Information Search and Retrieval


Icona documento 1) Download Document PDF


Icona documento Open access Icona documento Restricted Icona documento Private

 


Per ulteriori informazioni, contattare: Librarian http://puma.isti.cnr.it

Valid HTML 4.0 Transitional