Istituto di Scienza e Tecnologie dell'Informazione     
Berardi G., Esuli A., Fagni T., Sebastiani F. Classifying Websites by industry sector: a study in feature design. In: SAC'15 - 30th Annual ACM Symposium on Applied Computing (Salamanca, ES, 13-17 April 2015). Proceedings, pp. 1053 - 1059. ACM, 2015.
Classifying companies by industry sector is an important task in finance, since it allows investors and research analysts to analyse specific subsectors of local and global markets for investment monitoring and planning purposes. Traditionally this classification activity has been performed manually, by dedicated specialists carrying out in-depth analysis of a company's public profile. However, this is more and more unsuitable in nowadays's globalised markets, in which new companies spring up, old companies cease to exist, and existing companies refocus their efforts to different sectors at an astounding pace. As a result, tools for performing this classification automatically are increasingly needed. We address the problem of classifying companies by industry sector via the automatic classification of their websites, since the latter provide rich information about the nature of the company and market segment it targets. We have built a website classification system and tested its accuracy on a dataset of more than 20,000 company websites classified according to a 2-level taxonomy of 216 leaf classes explicitly designed for market research purposes. Our experimental study provides interesting insights as to which types of features are the most useful for this classification task.
URL: http://dl.acm.org/citation.cfm?id=2695722&CFID=734381158&CFTOKEN=34893976
DOI: 10.1145/2695664.2695722
Subject Website classification

Icona documento 1) Download Document PDF

Icona documento Open access Icona documento Restricted Icona documento Private


Per ulteriori informazioni, contattare: Librarian http://puma.isti.cnr.it

Valid HTML 4.0 Transitional