Sketch Recognition Fall 2010

Reading #30: Tahuti

2010-12-12T15:56:00.000-08:00

None yet

Hammond and Davis introduce Tahuti, a geometric recognition system for UML diagrams. Among other things, Tahuti can recognize a number of different arrows and arrowheads that are part of the UML domain.

Since I find UML to be an unnecessarily complicated and unnatural representation of a software system, I don't really care for Tahuti either.

Reading #29: Scratch Input

2010-12-12T15:53:00.000-08:00

None yet

Scratch input is a system that does simple gesture recognition using sound data. The authors use a sensor made out of a modified stethoscope to turn any surface into a gesture input area. Users can scratch the surface with a fingernail or some other type of stylus. Only a couple gestures are supported currently.

Using only one sensor, Scratch input is very limited in its ability to distinguish between gestures. With two or three sensors, the authors should be able to turn the sound data into a real sketch.

Reading #28: iCanDraw?

2010-12-12T15:42:00.000-08:00

None yet

Paulson et al. introduce iCanDraw?, a system for teaching users to draw a realistic human face. The system uses computer vision techniques to automatically create a template from any image of a person's face.

This paper does a good job of presenting "errors" to the user, and teaching the user how to correct them.

Reading #27: KSketch

2010-12-12T15:38:00.000-08:00

None yet

Davis et al. present a sketch-based system for creating Powerpoint-like animations. Users sketch objects and then create animations for those objects, such as translation, scaling, rotation, etc.

Since KSketch is implemented in C#, it would be cool to see it integrated into Powerpoint, as an alternative, since the authors show that it is preferred to the methods provided by Powerpoint.

Reading #26: Picturephone

2010-12-12T15:30:00.001-08:00

This reading has been marked as a duplicate of reading #24.

Reading #25: Descriptor for Image Retrieval

2010-12-12T15:29:00.000-08:00

None yet

Eitz et al. present an image descriptor that works both on a sketch (line drawing) and a processed image where the edges have been enhanced and isolated. The descriptor can be used to do sketch-based search over a large database of images.

Merging the sketch and image worlds is a pretty cool idea. If I know vaguely what an image looks like (or what I want an image of), then I should be able to draw a quick sketch of that image and get a real image as the result.

Reading #24: Games for Sketch Data Collection

2010-12-12T15:26:00.000-08:00

None yet

Johnson and Plimmer present several games that can be used to collect sketch data from users. In most games, sketches are created based on a textual description; participants are encouraged to make good sketches because that's how they will win the games.

I wish SOUSA was this fun.

Reading #23: InkSeine

2010-12-12T15:20:00.000-08:00

None yet

Hinkley et al. introduce InkSeine, a system for bring non-sketch items into sketches. While a user is taking some notes, she can use the in situ search to bring in external items, all with the pen in a natural way.

InkSeine seems like a good way to pair notetaking with research, and it's nice not to have to switch back and forth from pen to keyboard to do both tasks.

Reading #22: Plushie

2010-12-12T15:17:00.000-08:00

None yet

Plushie is a system for turning a Teddy (Reading #21) model into designs that can be printed out to make a real-life plush toy that looks just like the model. The Teddy interface is augmented with some operations that will help the user more easily sew the pattern together.

This is also a neat system, though my interest in 3D sketching and plush-toy creation is pretty limited.

Reading #21: Teddy

2010-12-12T15:13:00.000-08:00

None yet

Igarashi introduces Teddy, a system for turning 2D sketches into 3D models of plush toys. Teddy also introduces some interaction techniques for editing the models.

Teddy is a fun system to play with, although it's kind of frustrating if you're as bad of an artist as I am.

Reading #20: Mathpad2

2010-12-12T15:02:00.000-08:00

None yet

Mathpad2 is a cool interactive system for students to be able to visualize and work on math problems. The recognition is largely handled by Microsoft, but LaViola and Zeleznik introduce a cool trick for grouping characters in a sketch and also some neat interaction gestures (like the tapping).

After working so much on Mechanix, I feel like I can really relate to Mathpad2, a system with similar goals.

Reading #19: Conditional Random Fields

2010-12-12T14:58:00.000-08:00

None yet

Qi et al. present a grouping approach based on Conditional Random Fields. Conditional random fields are cool because they can take into account both local features of a stroke and also the stroke's interactions with the other strokes nearby.

There's a great payoff for learning it all, but this math is just so complicated and I just can't stay focused with so many equations all over the place.

Reading #18: Spatial Recognition and Grouping

2010-12-12T14:50:00.000-08:00

None yet

Shilman and Viola present an application of AdaBoost to simultaneously approach the grouping and recognition problems. The idea is simple: a group of strokes belong together if it can be recognized as something.

It's nice to see an approach that doesn't just assume the grouping problem will be solved by someone else. There are so many symbol recognition papers that make such an assumption, but not very many papers that actually provide any contribution toward solving that assumption.

Reading #17: Distinguishing Text from Graphics

2010-12-12T14:47:00.000-08:00

None yet

This paper presents a couple of HMM-based approaches for distinguishing text from graphics. The recognizer presented is probably similar to the one implemented by Microsoft, as described in the Patel et al. paper before.

This paper does a good job of discussing some of the challenges of text vs shape classification, especially the heavy bias toward text in most of the data that the authors collected.

Reading #16: Graph based symbol recognizer

2010-12-12T14:42:00.000-08:00

None yet

Lee et al. present a graph-based system for combining primitives into symbols and matching them to templates. They investigate four approaches to computing the best matching between primitives in the symbol to be recognized and the primitives in the template.

This paper would probably benefit from, well, taking a class like Introduction to Search 101. Stochastic Search is like a super-weakened version of simulated annealing. Error-based search is just a heuristic search, but doesn't use any framework/structure for heuristic searches like A*. The authors might consider a real implementation of simulated annealing, or maybe beefing up Greedy search into a beam search.

There's a polynomial-time exact solution to the matching problem. It's often slower than search techniques, but it does provide a much better solution.

Reading #15: Image-Based Trainable Symbol Recognizer

2010-12-12T14:33:00.000-08:00

None yet

Kara and Stahovich present a vision-based trainable symbol recognizer. The recognizer is scale, translation, and rotation invariant, and runs very quickly. The system is an instance-based classifier, so it is easy to add new classes or new training instances of already-defined classes.

This paper presents a good symbol recognizer that's pretty easy to implement. It's also a good introduction to a number of interesting distance metrics.

Reading #14: Entropy

2010-12-12T14:26:00.000-08:00

None yet

Bhat presents a system for distinguishing between text and shape strokes based on the entropy of the strokes. Strokes are translated into a string of characters representing which angle the stroke is pointed. Letters are added as the stroke changes direction, and also special markers for the endpoints. Then, a simple definition of zero-order entropy is used with those strings. Bhat also uses the gzip library to provide a higher-order definition of entropy.

Bhat further improves on the accuracy of the Patel et al. system from Reading #13.

Reading #13: Ink Features for Diagram Recognition

2010-12-12T14:09:00.000-08:00

None yet

Patel et al. present a number of features to use discriminating between text and shape. They use a decision tree with the features. Similar to other approaches, the accuracy is biased toward text strokes, but the Patel system improves on previous systems.

The Patel et al. system is very simple and computationally efficient, yet it performs the best out of all systems compared. I would have liked to see more discussion of why the authors chose to use rpart instead of other decision tree algorithms like C4.5 or algorithms specifically intended for feature subset selection.

Reading #12: Constellation models

2010-10-11T22:13:00.000-07:00

None yet

Sharon describes a system of Constellation Models for Sketch Recognition. Constellation models describe sketches by the relationships between various objects. The objects themselves are recognized using very simple features: the bounding box position, diagonal length, and angle. Interactions between objects are also described with very simple features: the distance between the centers of the strokes, the shortest distance between any two endpoints and the shortest distance between any two points.

The feature vectors are used in a ML search through the space of possible labellings to find the assignment of labels to strokes that maximizes the probability of the labeling being correct given the training data. The search is broken into two steps: required components and then optional components, which drastically improves the search time. Separating the search into two steps drastically reduces the search space, by reducing the potential number of labels and also the number of candidate strokes.

I think this is a really neat approach. With respect to the current assignment, it would be pretty cool to break course of action diagrams into components like this. For example, to identify the echelon modifier and the strength modifier.

This approach is also somewhat similar to LADDER in that it is very important where objects are in relation to the other objects in the sketch. For example, a right eye will never be on the wrong side of the left eye. These constraints are not modeled explicitly, but are learned from the training data. The constellation model only understands positional relationships, as opposed to LADDER, which can model more complex relationships like parallel-ness.

Reading #11 LADDER

2010-10-11T21:58:00.000-07:00

None yet

In this paper, Tracy unleashes LADDER unto the world. LADDER is a sketching language, designed to describe sketches as combinations of geometric primitives following geometric constraints. A primitive shape is one that cannot be broken down into simpler shapes, for example a line- or arc-segment. Geometric constraints define how shapes interact with the world and with each other. For example, a constraint might describe one line as horizontal or two lines as perpendicular.

LADDER builds complex shapes up from combinations of simpler shapes, simultaneously tackling the grouping and recognition problems. The downside to this approach is that putting a large number of primitives on the screen will cause an exponential runtime explosion, which has been known to injure innocent bystanders with recursively-enumerable shrapnel.

Tracy notes that LADDER works well for shapes that can be described in terms of geometric primitives, especially shapes that don't have too much detail. How, then, should we deal with detailed shapes? This is one area of consideration for the current assignment. Is there a general way of approaching detailed shapes, or does each shape require custom-tailored recognition techniques?

Reading #10: Herot

2010-10-11T21:45:00.000-07:00

None yet

Christopher Herot implements the Sezgin/Stahovich corner finder more than 25 years before they do. Or did he? I see a citation by Negroponte and Taggart in '71 called HUNCH-an experiment in sketch recognition. Maybe it really was more like 30 years before and not necessarily Herot.

In my estimation, Herot's real contribution in this paper is the exploration of context and user-guided recognition. Herot discusses training the corner finder to be parameterized for a single user. For example, if the user draws a square and tells STRAIN that she has done so, then STRAIN can be adjusted until it finds exactly five corners (including both endpoints).

Herot talks about representing sketches in a context-free structure. This reminds me of LADDER, whose Lisp-like language can probably be parsed by a simple push-down automata. Okay, that's just a fancy way of saying that HUNCH and LADDER consider the context of a sketch in very similar ways.

Reading #9: Paleo

2010-09-26T22:42:00.000-07:00

None yet

Paulson introduces PaleoSketch, a system for recognizing primitive shapes. A primitive shape is a simple shape drawn with one stroke, many of which cannot easily be broken down into simpler shapes. Paleo uses lots of rules to classify shapes as one of:

Line
Polyline
Arc
Circle
Ellipse
Helix
Spiral
Curve

Paleo recognizes more shapes than the previous work (Sezgin) and also features a much higher accuracy.

PaleoSketch is an important tool, because we have it now and it does a pretty decent job of solving an important problem. That being said, I don't like it one bit.

Paleo (as described in the paper) has 26 parameters. 26! That's a lot. And they're all "empirically determined", which means hand-tuned. If you add a new primitive shape, do you have to re-tune all 26 parameters plus the new ones you're bound to introduce if you extend the approach? If you turn off some primitives, are you working with an unoptimized parameter set? If the parameters aren't learned, there's no way for the end user to retrain them.

Next we come to the hierarchy. I have so many questions here. Is it overfit to the particular training case (single primitive shapes drawn in isolation)? I wrote a new primitive fit, okay, now where do I put it in the hierarchy? Or maybe I turned off some primitive fits, is there a better hierarchy for my new set? Again, how can I customize this to my particular needs without some painstaking, manual process?

Now the reason for the hierarchy: a line fit is not comparable to an arc fit is not comparable to a helix fit. If you can't compare fits to each other, can you possibly give me a reasonable estimate of a confidence for the fit? Let's consider the curvey polyline case, maybe I want to know it's a curve that's also pretty darn close to a polyline. If all Paleo can tell me is that it's more curve than polyline, how do I know if polyline is even a reasonable runner-up? Until you can boast 100% always, I'd really like to have a reasonable estimate of your confidence (and if you have that for all your fits, it would seem they're comparable to each other).

Sure, it's obvious that Paleo runs in real time. But how much time does it leave for a high-level recognizer to also run and still call it "real time"? Calculating individual fits certainly seems like an extensible approach, but putting all of the features through a classifier (linear, quadratic, SVM, neural net, decision tree, whatever) would probably be faster. I think the extensibility argument is moot given the implications of parameter tuning and fitting new fits into the hierarchy. All that's left is to wonder which does a better job, accuracy-wise.

Reading #7: Sezgin Early Stroke Processing

2010-09-20T18:55:00.000-07:00

Daniellita

This is a seminal work in corner finding. I've heard lots about it from Aaron over the course of his various talks. Sezgin combines speed and curvature data to filter through a set of potential corners. The segments are then fit to lines or bezier segments.

Sezgin also discusses some beautification techniques.

Constraint satisfaction to make ends meet and parallel slopes parallel, etc.
Replacing recognized strokes with a predefined symbol

I like the use of bezier segments. Most segmentation papers either restrict sketches to polylines or use arc segments. I've never liked arc segments because they aren't descriptive. My solution would be ellipse segments, which are more configurable. The drawback is that a straight line can match the edge of a really long, flat ellipse. I think it would be pretty simple to constrain the ellipses to prevent this kind of action, though.

Bezier segments, on the other hand, are 3rd degree polynomials that can pretty well approximate most strokes. For example, the Microsoft ink libraries does some bezier approximation of strokes to smooth them out and it looks really nice. My worry would be that bezier curves are too descriptive and can approximate something complex enough that it really should be segmented.

"correct" isn't really defined in the evaluation section. 96% is really good, and the cynic in my is asking what's up? why isn't the system in this paper the de facto standard for corner finding? why didn't the reviewers ask how they defined "correct"? clarify, please, so i don't have to doubt you. until then, my assumption is that you're using statistics to lie. it may be unintentional, but i still am very turned off by it. opaque is bad. transparent is good. (can you tell that i like OSS?)

Reading #8: $N

2010-09-20T18:33:00.000-07:00

Yue Li

$N is a natural extension of the $1 approach to multi-stroke symbols. It lifts restrictions on (or removes the ability to specify) stroke order and direction. This moves it closer to a free-sketch symbol recognizer. It has the possibility to also lift the stroke number restriction, though that can be kept on if desired (for speed, or extra constraint).

Very much in the spirit of $1, $N is very simple. The templates are very similar to $1 templates, all 1 "stroke" and the same PPS points per stroke for every template. Matching two templates is also the same. Instead of figuring out how to put together N strokes into 1, $N just tries all possible combinations of stroke order and direction. Simple, thorough.

I think anyone who makes a multi-stroke version of $1 is automatically a pretty cool person. I mean, it really was just asking for it. $N is a good effort. Really, really in the spirit of $1. Super duper simple.

I disagree with some of the design choices. Including the "in the air" portions of the symbol introduces ambiguity (x vs alpha (the alpha glyph for this font looks terrible, sorry)) and dilutes the data (longer strokes means more space between points when resampled). Having complexity based on the number of strokes is a tough battle. Having O(N!*2^N) complexity in the number of strokes is an even tougher battle. 5 strokes means >3000 templates, 6 strokes goes up to >46000, 7 strokes is >645000. As stated, this limits the expressivity of the template set.

That being said, no one is claiming $N is some sort of silver bullet. In fact, the authors are very upfront about the limitations of $N.

Reading #6: Protractor

2010-09-13T14:54:00.000-07:00

Drew

Protractor applies calculus to $1 to get rid of the rotation search. It also uses the cosine similarity metric instead of the euclidean distance metric. These small differences give it a significant advantage when tested on a large dataset of latin characters drawn with fingers on a mobile device. Finally, Protractor is able to achieve the same or better accuracy compared to $1 when using 16 PPS instead of 64. This helps reduce computation time and memory use.

I mentioned before that I liked $1 as a platform to build stuff off of. Here's another good example of customizing the base algorithm to add something new like rotation sensitivity.

I would like to see a sensitivity analysis on the points per shape for both $1 and Protractor. Until I see that, I don't think it's quite fair to say that Protractor requires 1/4 the memory of $1, since you could just use 16 PPS as the constant in the original $1 algorithm, too.

Using the cosine distance is clever, I've never really thought about considering stroke data as an n-dimensional vector before, but rather as a collection of 2- or 3-dimensional vectors (depending on whether time data is present). Although, it does open a can of worms wondering about other distance or similarity metrics outside of euclidean and cosine.