Istituto di Scienza e Tecnologie dell'Informazione     
Giorgetti D., Sebastiani F. Automatic survey coding by multiclass text categorization techniques. In: Journal of the American Society for Information Science and Technology, vol. 54 (14) pp. 1269 - 1277. Wiley, 2003.
Survey coding is the task of assigning a symbolic code from a prede.ned set of such codes to the answer given in response to an open-ended question in a questionnaire (aka survey). This task is usually carried out to group respondents according to a predefined scheme based on their answers. Survey coding has several applications, especially in the social sciences, ranging from the simple classification of respondents to the extraction of statistics on political opinions, health and lifestyle habits, customer satisfaction, brand fidelity, and patient satisfaction. Survey coding is a difficult task, because the code that should be attributed to a respondent based on the answer she has given is a matter of subjective judgment, and thus requires expertise. It is thus unsurprising that this task has traditionally been performed manually, by trained coders. Some attempts have been made at automating this task, most of them based on detecting the similarity between the answer and textual descriptions of the meanings of the candidate codes. We take a radically new stand, and formulate the problem of automated survey coding as a text categorization problem, that is, as the problem of learning, by means of supervised machine learning techniques, a model of the association between answers and codes from a training set of precoded answers, and applying the resulting model to the classi.cation of new answers. In this article we experiment with two different learning techniques: one based on naive Bayesian classification, and the other one based on multiclass support vector machines, and test the resulting framework on a corpus of social surveys. The results we have obtained significantly outperform the results achieved by previous automated survey coding approaches.
Subject learning
I.5.2 Classifier Design and Evaluation
I.2.6 Learning
H.3.3 Information Search and Retrieval
J.4 Sociology

Icona documento 1) Download Document PDF

Icona documento Open access Icona documento Restricted Icona documento Private


Per ulteriori informazioni, contattare: Librarian http://puma.isti.cnr.it

Valid HTML 4.0 Transitional