PUMA
Istituto di Scienza e Tecnologie dell'Informazione     
Sebastiani F. Machine learning in automated text categorisation. In: Acm Computing Surveys, vol. 34 (1) pp. 1 - 47. ACM, 2002.
 
 
Abstract
(English)
The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last 10 years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert labor power, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.
Subject Machine learning
Text categorization
Text classification
H.3.1 Content Analysis and Indexing
H.3.3 Information Search and Retrieval
H.3.4 Systems and Software
I.2.6 Learning


Icona documento 1) Download Document PDF


Icona documento Open access Icona documento Restricted Icona documento Private

 


Per ulteriori informazioni, contattare: Librarian http://puma.isti.cnr.it

Valid HTML 4.0 Transitional