Istituto di Scienza e Tecnologie dell'Informazione     
Orlando S., Palmerini P., Perego R., Silvestri F. Scheduling high performance data mining tasks on a data grid environment. In: Euro-Par 2002 Parallel Processing - 8th International Euro-Par Conference (Paderborn, Germany, August 2002 2002). Proceedings, pp. 375 - 384. B.Monien, R. Feldman (eds.). (Lecture Notes in Computer Science, vol. 2400). Springer, 2002.
Increasingly the datasets used for data mining are becoming huge and physically distributed. Since the distributed knowledge discovery process is bothdata and computational intensive, the Grid is a natural platform for deploying a high performance data mining service. The focus of this paper is on the core services of such a Grid infrastructure. In particular we concentrate our attention on the design and implementation of specialized broker aware of data source locations and resource needs of data mining tasks. Allocation and scheduling decisions are taken on the basis of performance cost metrics and models that exploit knowledge about previous executions, and use sampling to acquire estimate about execution behavior.
Subject High performance
Data Mining
H.2.8 Database Applications. Data mining

Icona documento 1) Download Document PDF

Icona documento Open access Icona documento Restricted Icona documento Private


Per ulteriori informazioni, contattare: Librarian http://puma.isti.cnr.it

Valid HTML 4.0 Transitional