Publisher: Edinburgh University Press
E-ISSN: 2053-7832|9|1-3|58-77
ISSN: 0957-0144
Source: History and Computing, Vol.9, Iss.1-3, 1997-01, pp. : 58-77
Disclaimer: Any content in publications that violate the sovereignty, the constitution or regulations of the PRC is not accepted or approved by CNPIEC.
Abstract
In principle, printed source material should be made machine-readable with systems for Optical Character Recognition, rather than being typed once more. Offthe-shelf commercial OCR programs tend, however, to be inadequate for lists with a complex layout. The tax assessment lists that assess most nineteenth century farms in Norway, constitute one example among a series of valuable sources which can only be interpreted successfully with specially designed OCR software. This paper considers the problems involved in the recognition of material with a complex table structure, outlining a new algorithmic model based on ‘linked hierarchies’. Within the scope of this model, a variety of tables and layouts can be described and recognized. The ‘linked hierarchies’ model has been implemented in the ‘CRIPT’ OCR software system, which successfully reads tables with a complex structure from several different historical sources.
Related content
Conditional Random Fields for Pattern Recognition Applied to Structured Data
Algorithms, Vol. 8, Iss. 3, 2015-07 ,pp. :
Shifting perspectives: method, media and the complex image
History and Computing, Vol. 10, Iss. 1-3, 1998-01 ,pp. :
PACP: A Position-Independent Activity Recognition Method Using Smartphone Sensors
Information, Vol. 7, Iss. 4, 2016-12 ,pp. :