PUMA
Istituto di Scienza e Tecnologie dell'Informazione     
Orlando S., Lucchese C., Palmerini P., Perego R., Silvestri F. kDCI: a Multi-Strategy Algorithm for Mining Frequent Sets. In: FIMI '03, Frequent Itemset Mining Implementations, Proceedings of (Melbourne, Florida, USA, 19 December 2003). Proceedings, vol. 90 pp. 1 - 10. Bart Goethals, Mohammed Javeed Zaki. CEUR Workshop Proceedings, 2003.
 
 
Abstract
(English)
This paper presents the implementation of DCI++, an enhancement of DCI, a scalable algorithm for discovering frequent sets in large databases. The main contribution of DCI++ resides on a novel counting inference strategy, inspired by previously known results by Basted et al. Moreover, multiple heuristics and efficient data structures are used in order to adapt the algorithm behavior to the features of the specific dataset mined and of the computing platform used. DCI++ turns out to be effective in mining both short and long patterns from a variety of datasets. We conducted a wide range of experiments on synthetic and real-world datasets, both in-core and out-of-core. The results obtained allow us to state that DCIpp performances are not over-fitted to a special case, and its high performance is maintained on datasets with different characteristics.
Subject Frequent Patterns Mining
Algorithms
D.2.8 Metrics
H.2.8 Database Applications


Icona documento 1) Download Document PDF


Icona documento Open access Icona documento Restricted Icona documento Private

 


Per ulteriori informazioni, contattare: Librarian http://puma.isti.cnr.it

Valid HTML 4.0 Transitional