This paper presents a couple of HMM-based approaches for distinguishing text from graphics. The recognizer presented is probably similar to the one implemented by Microsoft, as described in the Patel et al. paper before.
This paper does a good job of discussing some of the challenges of text vs shape classification, especially the heavy bias toward text in most of the data that the authors collected.
The text, it has bias. The gaps. They are gappy. I liked the discussion of the text v shape problem, and the gaps/time features were interesting and interesting.
ReplyDelete