Istituto di Scienza e Tecnologie dell'Informazione     
Tonazzini A., Bedini L., Salerno E. Independent Component Analysis for Document Restoration. Sottomesso a: International Journal on Document Analysis and Recognition, Technical report, 2003.
In the digital images of many documents, the legibility of the text is often compromised by the presence of artifacts in the background. These can derive from many kinds of degradations, such as spots, underwritings, show-through or bleed-through effects. The use of thresholding techniques to remove the background, while can perform well for black and white documents, is not effective for gray level or color documents, since the color values of this background can be very close to those of the text. For the specific problem of bleed-through/show-through, some work has been done, mainly based on the comparison between the front and back page. This, however, requires a preliminary registration of the two images. In this paper, we propose a novel approach, based on viewing the problem as one of separating overlapped texts, and then reformulating it as a Blind Source Separation problem, approached through Independent Component Analysis techniques. Our method and uses the spectral components of the image at different bands, so that there is no need for registration. Examples of bleed-through cancellation and recovering of underwriting from palimpsests are provided.
Subject digital images
I.4.8 Scene Analysis: Color
I.5 PATTERN RECOGNITION I.5.4 Applications: Text processing
I.7 DOCUMENT AND TEXT PROCESSING I.7.5 Document Capture: Document

Icona documento 1) Download Document PDF

Icona documento Open access Icona documento Restricted Icona documento Private


Per ulteriori informazioni, contattare: Librarian http://puma.isti.cnr.it

Valid HTML 4.0 Transitional