Istituto di Scienza e Tecnologie dell'Informazione     
Orlando S., Perego R., Silvestri C. A new algorithm for gap constrained sequence mining. In: Proceedings of the 2004 ACM symposium on Applied computing (Nicosia, Cyprus, 2004). Proceedings, pp. 540 - 547. ACM Press, 2004.
The sequence mining problem consists in finding frequent sequential patterns in a database of time-stamped events. Several application domains require limiting the maximum temporal gap between events occurring in the input sequences. However pushing down such constraint is critical for most sequence mining algorithms.In this paper we describe CCSM (Cache-based Constrained Sequence Miner), a new level-wise algorithm that overcomes the troubles usually related to this kind of constraints. CCSM adopts an innovative approach based on k-way intersections of idlists to compute the support of candidate sequences. Our k-way intersection method is enhanced by the use of an effective cache that stores intermediate idlists for future reuse. The reuse of intermediate results entails a surprising reduction in the actual number of join operations performed on idlists.CCSM has been experimentally compared with cSPADE, a state of the art algorithm, on several synthetically generated datasets, obtaining better or similar results in most cases.
URL: http://portal.acm.org/citation.cfm?doid=968014
Subject Data mining
H.2.8 Data mining

Icona documento 1) Download Document PDF

Icona documento Open access Icona documento Restricted Icona documento Private


Per ulteriori informazioni, contattare: Librarian http://puma.isti.cnr.it

Valid HTML 4.0 Transitional