Istituto di Informatica e Telematica     
Saracino A., Martinelli F., Sheikhalishahi M., Mejri M., Tawbi N. Digital Waste Sorting: A Goal-Based, Self-Learning Approach to Label Spam Email Campaigns. In: STM 2015 (Security and Trust Management) (Vienna, Austria, 21-22/09 2015). Proceedings, pp. 3 - 19. (Lecture Notes in Computer Science, vol. 9331). Springer Verlag, 2015.
Fast analysis of correlated spam emails may be vital in the effort of finding and prosecuting spammers performing cybercrimes such as phishing and online frauds. This paper presents a self-learning framework to automatically divide and classify large amounts of spam emails in correlated labeled groups. Building on large datasets daily collected through honeypots, the emails are firstly divided into homogeneous groups of similar messages campaigns), which can be related to a specific spammer. Each campaign is then associated to a class which specifies the goal of the spammer, i.e. phishing, advertisement, etc. The proposed framework exploits a categorical clustering algorithm to group similar emails, and a classifier to subsequently label each email group. The main advantage of the proposed framework is that it can be used on large spam emails datasets, for which no prior knowledge is provided. The approach has been tested on more than 3200 real and recent spam emails, divided in more than 60 campaigns, reporting a classification accuracy of 97% on the classified data.
pringer International Publishing Switzerland 2015 S. Foresti (Ed.): STM 2015, LNCS 9331, pp. 319, 2015. DOI: 10.1007/978-3-319-24858-5 1
DOI: 10.1007/978-3-319-24858-5_1
Subject Spam Detection
machine learning
K.6.5 Security and protection

Icona documento 1) Download Document PDF

Icona documento Open access Icona documento Restricted Icona documento Private


Per ulteriori informazioni, contattare: Librarian http://puma.isti.cnr.it

Valid HTML 4.0 Transitional