Istituto di Scienza e Tecnologie dell'Informazione     
Avancini H., Lavelli A., Sebastiani F., Zanoli R. Automatic expansion of domain-specific lexicons by term categorization. In: ACM Transactions on Speech and Language Processing, vol. 3 (1) pp. 1 - 30. ACM, 2006.
We discuss an approach to the automatic expansion of domain-specific lexicons, i.e., to the problem of extending, for each ci in a predefined set C = {c1, . . . , cm} of semantic domains, an initial lexicon Li 0 into a larger lexicon Li 1. Our approach relies on term categorization, defined as the task of labeling previously unlabeled terms according to a predefined set of domains. We approach this as a supervised learning problem, in which term classifiers are built using the initial lexicons as training data. Dually to classic text categorization tasks, in which documents are represented as vectors in a space of terms, we represent terms as vectors in a space of documents. We present the results of a number of experiments in which we use a boosting-based learning device for training our term classifiers. We test the effectiveness of our method by using WordNetDomains, a well-known large set of domain-specific lexicons, as a benchmark. Our experiments are performed using the documents in the Reuters Corpus Volume 1 as 'implicit' representations for our terms.
URL: http://www.math.unipd.it/~fabseb60/Publications/TSLP06.pdf
Subject Lexicons
Text classification
Machine learning
I.5.2 Classifier design and evaluation
I.2.7 Natural Language Processing
H.3.1 Content Analysis and Indexing

Icona documento 1) Download Document PDF

Icona documento Open access Icona documento Restricted Icona documento Private


Per ulteriori informazioni, contattare: Librarian http://puma.isti.cnr.it

Valid HTML 4.0 Transitional