Sketch Recognition Fall 2010: December 2010

Sunday, December 12, 2010

Reading #30: Tahuti

None yet

Hammond and Davis introduce Tahuti, a geometric recognition system for UML diagrams. Among other things, Tahuti can recognize a number of different arrows and arrowheads that are part of the UML domain.

Since I find UML to be an unnecessarily complicated and unnatural representation of a software system, I don't really care for Tahuti either.

Reading #29: Scratch Input

None yet

Scratch input is a system that does simple gesture recognition using sound data. The authors use a sensor made out of a modified stethoscope to turn any surface into a gesture input area. Users can scratch the surface with a fingernail or some other type of stylus. Only a couple gestures are supported currently.

Using only one sensor, Scratch input is very limited in its ability to distinguish between gestures. With two or three sensors, the authors should be able to turn the sound data into a real sketch.

Reading #28: iCanDraw?

None yet

Paulson et al. introduce iCanDraw?, a system for teaching users to draw a realistic human face. The system uses computer vision techniques to automatically create a template from any image of a person's face.

This paper does a good job of presenting "errors" to the user, and teaching the user how to correct them.

Reading #27: KSketch

None yet

Davis et al. present a sketch-based system for creating Powerpoint-like animations. Users sketch objects and then create animations for those objects, such as translation, scaling, rotation, etc.

Since KSketch is implemented in C#, it would be cool to see it integrated into Powerpoint, as an alternative, since the authors show that it is preferred to the methods provided by Powerpoint.

Reading #26: Picturephone

This reading has been marked as a duplicate of reading #24.

Reading #25: Descriptor for Image Retrieval

None yet

Eitz et al. present an image descriptor that works both on a sketch (line drawing) and a processed image where the edges have been enhanced and isolated. The descriptor can be used to do sketch-based search over a large database of images.

Merging the sketch and image worlds is a pretty cool idea. If I know vaguely what an image looks like (or what I want an image of), then I should be able to draw a quick sketch of that image and get a real image as the result.

Reading #24: Games for Sketch Data Collection

None yet

Johnson and Plimmer present several games that can be used to collect sketch data from users. In most games, sketches are created based on a textual description; participants are encouraged to make good sketches because that's how they will win the games.

I wish SOUSA was this fun.

Reading #23: InkSeine

None yet

Hinkley et al. introduce InkSeine, a system for bring non-sketch items into sketches. While a user is taking some notes, she can use the in situ search to bring in external items, all with the pen in a natural way.

InkSeine seems like a good way to pair notetaking with research, and it's nice not to have to switch back and forth from pen to keyboard to do both tasks.

Reading #22: Plushie

None yet

Plushie is a system for turning a Teddy (Reading #21) model into designs that can be printed out to make a real-life plush toy that looks just like the model. The Teddy interface is augmented with some operations that will help the user more easily sew the pattern together.

This is also a neat system, though my interest in 3D sketching and plush-toy creation is pretty limited.

Reading #21: Teddy

None yet

Igarashi introduces Teddy, a system for turning 2D sketches into 3D models of plush toys. Teddy also introduces some interaction techniques for editing the models.

Teddy is a fun system to play with, although it's kind of frustrating if you're as bad of an artist as I am.

Reading #20: Mathpad2

None yet

Mathpad2 is a cool interactive system for students to be able to visualize and work on math problems. The recognition is largely handled by Microsoft, but LaViola and Zeleznik introduce a cool trick for grouping characters in a sketch and also some neat interaction gestures (like the tapping).

After working so much on Mechanix, I feel like I can really relate to Mathpad2, a system with similar goals.

Reading #19: Conditional Random Fields

None yet

Qi et al. present a grouping approach based on Conditional Random Fields. Conditional random fields are cool because they can take into account both local features of a stroke and also the stroke's interactions with the other strokes nearby.

There's a great payoff for learning it all, but this math is just so complicated and I just can't stay focused with so many equations all over the place.

Reading #18: Spatial Recognition and Grouping

None yet

Shilman and Viola present an application of AdaBoost to simultaneously approach the grouping and recognition problems. The idea is simple: a group of strokes belong together if it can be recognized as something.

It's nice to see an approach that doesn't just assume the grouping problem will be solved by someone else. There are so many symbol recognition papers that make such an assumption, but not very many papers that actually provide any contribution toward solving that assumption.

Reading #17: Distinguishing Text from Graphics

None yet

This paper presents a couple of HMM-based approaches for distinguishing text from graphics. The recognizer presented is probably similar to the one implemented by Microsoft, as described in the Patel et al. paper before.

This paper does a good job of discussing some of the challenges of text vs shape classification, especially the heavy bias toward text in most of the data that the authors collected.

Reading #16: Graph based symbol recognizer

None yet

Lee et al. present a graph-based system for combining primitives into symbols and matching them to templates. They investigate four approaches to computing the best matching between primitives in the symbol to be recognized and the primitives in the template.

This paper would probably benefit from, well, taking a class like Introduction to Search 101. Stochastic Search is like a super-weakened version of simulated annealing. Error-based search is just a heuristic search, but doesn't use any framework/structure for heuristic searches like A*. The authors might consider a real implementation of simulated annealing, or maybe beefing up Greedy search into a beam search.

There's a polynomial-time exact solution to the matching problem. It's often slower than search techniques, but it does provide a much better solution.

Reading #15: Image-Based Trainable Symbol Recognizer

None yet

Kara and Stahovich present a vision-based trainable symbol recognizer. The recognizer is scale, translation, and rotation invariant, and runs very quickly. The system is an instance-based classifier, so it is easy to add new classes or new training instances of already-defined classes.

This paper presents a good symbol recognizer that's pretty easy to implement. It's also a good introduction to a number of interesting distance metrics.

Reading #14: Entropy

None yet

Bhat presents a system for distinguishing between text and shape strokes based on the entropy of the strokes. Strokes are translated into a string of characters representing which angle the stroke is pointed. Letters are added as the stroke changes direction, and also special markers for the endpoints. Then, a simple definition of zero-order entropy is used with those strings. Bhat also uses the gzip library to provide a higher-order definition of entropy.

Bhat further improves on the accuracy of the Patel et al. system from Reading #13.

Reading #13: Ink Features for Diagram Recognition

None yet

Patel et al. present a number of features to use discriminating between text and shape. They use a decision tree with the features. Similar to other approaches, the accuracy is biased toward text strokes, but the Patel system improves on previous systems.

The Patel et al. system is very simple and computationally efficient, yet it performs the best out of all systems compared. I would have liked to see more discussion of why the authors chose to use rpart instead of other decision tree algorithms like C4.5 or algorithms specifically intended for feature subset selection.

Sketch Recognition Fall 2010

Sunday, December 12, 2010

Reading #30: Tahuti

Reading #29: Scratch Input

Reading #28: iCanDraw?

Reading #27: KSketch

Reading #26: Picturephone

Reading #25: Descriptor for Image Retrieval

Reading #24: Games for Sketch Data Collection

Reading #23: InkSeine

Reading #22: Plushie

Reading #21: Teddy

Reading #20: Mathpad2

Reading #19: Conditional Random Fields

Reading #18: Spatial Recognition and Grouping

Reading #17: Distinguishing Text from Graphics

Reading #16: Graph based symbol recognizer

Reading #15: Image-Based Trainable Symbol Recognizer

Reading #14: Entropy

Reading #13: Ink Features for Diagram Recognition

Draw on my Blog!

Legend