PUMA
Istituto di Scienza e Tecnologie dell'Informazione     
Tonazzini A., Gerace I., Martinelli F. Document image restoration and analysis as separation of mixtures of patterns: from linear to non-linear models. Bahadir K. Gunturk, Xin Li (eds.). (Digital Imaging and Computer Vision Series). Boca Raton: CRC Press, 2012.
 
 
Abstract
(English)
Conservation, readability and content analysis of ancient documents is often compromised by several and different damages that they have undertaken over time, and that continue to cause a progressive decay. Natural ageing, usage, poor storage conditions, humidity, molds, insect infestations and fires are the most diffuse degradation factors. In addition, the materials used in the original production of the documents, i.e. paper or parchment and inks, are usually highly variable in consistency and characteristics. All these factors concur to cause ink diffusion and fading, seeping of ink from the reverse side (bleed-through distortion), transparency from either the reverse side or from subsequent pages (show-through distortion), spots, noise, low contrast, unfocused, faint, fragmented or joined characters. Furthermore, these defects are usually varying across the document. These problems are common to the majority of the governmental, historical, ecclesiastic and commercial archives in Europe, so that seeking out for a remedy would have an enormous social and technological impact. Digital imaging can play a fundamental role in this respect. Indeed, it is an essential tool for generating digital archives, in order to ensure the documents accessibility and conservation, especially for those rare or very important historical documents, whose fragility prevents the direct access by scholars and historians. Moreover, OCR processing for automatic transcription and indexing facilitates the access to the digital archives and the retrieval of information. Finally, a very common need is the improvement of the readability by the side of the interested scholars. Often, the digital images of documents are acquired only in grayscale or, at best, in the visible range of the spectrum, due to the larger diffusion of the related acquisition equipments. However, owing to specific damages, some documents may be very difficult to read to the naked eye. This particularly concerns documents produced during the XVI and XVII centuries, due to the corrosion, fading, seeping and diffusion of the ink used (iron-gall mostly), and those produced even more recently, due to the bad quality of the paper that started being used after the XIX century. Furthermore, interesting features are often barely detectable in the original color document, while revealing the whole contents is an important aid to scholars that are interested in dating or establishing the origin of the document itself, or reading hidden text it may contain. As an example, in the case of palimpsests, usually ancient manuscripts that have been erased and then rewritten, what is desired is to enhance and let ``emerge'' the traces of the original underwriting. Thus, additional information can sometimes be obtained from images taken at non-visible wavelengths, for instance in the near infrared and ultraviolet ranges. Alternatively, or in conjunction with multispectral/hyperspectral acquisitions, digital image processing techniques can be used for enhancing the readability of the document contents and seeking out new information.
URL: http://www.crcpress.com/product/isbn/9781439869550
Subject Document Analysis and Restoration
Blind Source Separation
Image regularization
I.4 IMAGE PROCESSING AND COMPUTER VISION


Icona documento 1) Download Document PDF


Icona documento Open access Icona documento Restricted Icona documento Private

 


Per ulteriori informazioni, contattare: Librarian http://puma.isti.cnr.it

Valid HTML 4.0 Transitional