PUMA
Istituto di Scienza e Tecnologie dell'Informazione     
Giorgetti D., Sebastiani F. Multiclass text categorization for automated survey coding. In: ACM Symposium on Applied Computing. SAC 2003 (Melbourne, Florida, 9-12 March 2003). Proceedings, pp. 798 - 802. ACM, 2003.
 
 
Abstract
(English)
Survey coding is the task of assigning a symbolic code from a predefined set of such codes to the answer given in response to an open-ended question in a questionnaire (aka survey). We formulate the problem of automated survey coding as a text categorization problem, i.e. as the problem of learning, by means of supervised machine learning techniques, a model of the association between answers and codes from a training set of pre-coded answers, and applying the resulting model to the classi.cation of new answers. In this paper we experiment with two different learning techniques, one based on naĻve Bayesian classi.cation and the other one based on multiclass support vector machines, and test the resulting framework on a corpus of social surveys. The results we have obtained significantly outperform the results achieved by previous automated survey coding approaches.
Subject text categorization
I.5.2 Classifier Design and Evaluation
I.2.6 Learning
H.3.3 Information Search and Retrieval
J.4 Sociology


Icona documento 1) Download Document PDF


Icona documento Open access Icona documento Restricted Icona documento Private

 


Per ulteriori informazioni, contattare: Librarian http://puma.isti.cnr.it

Valid HTML 4.0 Transitional