Title : Segmentation of Text in printed pages Name : Ngin Choo, Lim ( lngincho@stanford.edu ) Description ------------ Printed pages contain text, pictures and images information. The purpose of the project is to segment the text blocks from the non-text blocks (picture and images). There are 2 main parts :- a) Block segmentation in printed pages. This step basically picks out the information content of the pages in blocks. To do this, I will be smearing the digitized printed pages first followed by edge detection to determine the regions of information. The parameter that can affect performance will be the neighbourhood of smearing. b) The next step is the classification of text and non-text regions. DCT-based, edge based classification techniques have been developed. I plan to exploit the regularity in text blocks (regular spacing between text lines, words and alphabets) in doing the classification. This regularity might be reflected in the auto-correlation matrix of text blocks.