Sketch Recognition Fall 2010: September 2010

Sunday, September 26, 2010

Reading #9: Paleo

None yet

Paulson introduces PaleoSketch, a system for recognizing primitive shapes. A primitive shape is a simple shape drawn with one stroke, many of which cannot easily be broken down into simpler shapes. Paleo uses lots of rules to classify shapes as one of:

Line
Polyline
Arc
Circle
Ellipse
Helix
Spiral
Curve

Paleo recognizes more shapes than the previous work (Sezgin) and also features a much higher accuracy.

PaleoSketch is an important tool, because we have it now and it does a pretty decent job of solving an important problem. That being said, I don't like it one bit.

Paleo (as described in the paper) has 26 parameters. 26! That's a lot. And they're all "empirically determined", which means hand-tuned. If you add a new primitive shape, do you have to re-tune all 26 parameters plus the new ones you're bound to introduce if you extend the approach? If you turn off some primitives, are you working with an unoptimized parameter set? If the parameters aren't learned, there's no way for the end user to retrain them.

Next we come to the hierarchy. I have so many questions here. Is it overfit to the particular training case (single primitive shapes drawn in isolation)? I wrote a new primitive fit, okay, now where do I put it in the hierarchy? Or maybe I turned off some primitive fits, is there a better hierarchy for my new set? Again, how can I customize this to my particular needs without some painstaking, manual process?

Now the reason for the hierarchy: a line fit is not comparable to an arc fit is not comparable to a helix fit. If you can't compare fits to each other, can you possibly give me a reasonable estimate of a confidence for the fit? Let's consider the curvey polyline case, maybe I want to know it's a curve that's also pretty darn close to a polyline. If all Paleo can tell me is that it's more curve than polyline, how do I know if polyline is even a reasonable runner-up? Until you can boast 100% always, I'd really like to have a reasonable estimate of your confidence (and if you have that for all your fits, it would seem they're comparable to each other).

Sure, it's obvious that Paleo runs in real time. But how much time does it leave for a high-level recognizer to also run and still call it "real time"? Calculating individual fits certainly seems like an extensible approach, but putting all of the features through a classifier (linear, quadratic, SVM, neural net, decision tree, whatever) would probably be faster. I think the extensibility argument is moot given the implications of parameter tuning and fitting new fits into the hierarchy. All that's left is to wonder which does a better job, accuracy-wise.

Monday, September 20, 2010

Reading #7: Sezgin Early Stroke Processing

Daniellita

This is a seminal work in corner finding. I've heard lots about it from Aaron over the course of his various talks. Sezgin combines speed and curvature data to filter through a set of potential corners. The segments are then fit to lines or bezier segments.

Sezgin also discusses some beautification techniques.

Constraint satisfaction to make ends meet and parallel slopes parallel, etc.
Replacing recognized strokes with a predefined symbol

I like the use of bezier segments. Most segmentation papers either restrict sketches to polylines or use arc segments. I've never liked arc segments because they aren't descriptive. My solution would be ellipse segments, which are more configurable. The drawback is that a straight line can match the edge of a really long, flat ellipse. I think it would be pretty simple to constrain the ellipses to prevent this kind of action, though.

Bezier segments, on the other hand, are 3rd degree polynomials that can pretty well approximate most strokes. For example, the Microsoft ink libraries does some bezier approximation of strokes to smooth them out and it looks really nice. My worry would be that bezier curves are too descriptive and can approximate something complex enough that it really should be segmented.

"correct" isn't really defined in the evaluation section. 96% is really good, and the cynic in my is asking what's up? why isn't the system in this paper the de facto standard for corner finding? why didn't the reviewers ask how they defined "correct"? clarify, please, so i don't have to doubt you. until then, my assumption is that you're using statistics to lie. it may be unintentional, but i still am very turned off by it. opaque is bad. transparent is good. (can you tell that i like OSS?)

Reading #8: $N

Yue Li

$N is a natural extension of the $1 approach to multi-stroke symbols. It lifts restrictions on (or removes the ability to specify) stroke order and direction. This moves it closer to a free-sketch symbol recognizer. It has the possibility to also lift the stroke number restriction, though that can be kept on if desired (for speed, or extra constraint).

Very much in the spirit of $1, $N is very simple. The templates are very similar to $1 templates, all 1 "stroke" and the same PPS points per stroke for every template. Matching two templates is also the same. Instead of figuring out how to put together N strokes into 1, $N just tries all possible combinations of stroke order and direction. Simple, thorough.

I think anyone who makes a multi-stroke version of $1 is automatically a pretty cool person. I mean, it really was just asking for it. $N is a good effort. Really, really in the spirit of $1. Super duper simple.

I disagree with some of the design choices. Including the "in the air" portions of the symbol introduces ambiguity (x vs alpha (the alpha glyph for this font looks terrible, sorry)) and dilutes the data (longer strokes means more space between points when resampled). Having complexity based on the number of strokes is a tough battle. Having O(N!*2^N) complexity in the number of strokes is an even tougher battle. 5 strokes means >3000 templates, 6 strokes goes up to >46000, 7 strokes is >645000. As stated, this limits the expressivity of the template set.

That being said, no one is claiming $N is some sort of silver bullet. In fact, the authors are very upfront about the limitations of $N.

Monday, September 13, 2010

Reading #6: Protractor

Drew

Protractor applies calculus to $1 to get rid of the rotation search. It also uses the cosine similarity metric instead of the euclidean distance metric. These small differences give it a significant advantage when tested on a large dataset of latin characters drawn with fingers on a mobile device. Finally, Protractor is able to achieve the same or better accuracy compared to $1 when using 16 PPS instead of 64. This helps reduce computation time and memory use.

I mentioned before that I liked $1 as a platform to build stuff off of. Here's another good example of customizing the base algorithm to add something new like rotation sensitivity.

I would like to see a sensitivity analysis on the points per shape for both $1 and Protractor. Until I see that, I don't think it's quite fair to say that Protractor requires 1/4 the memory of $1, since you could just use 16 PPS as the constant in the original $1 algorithm, too.

Using the cosine distance is clever, I've never really thought about considering stroke data as an n-dimensional vector before, but rather as a collection of 2- or 3-dimensional vectors (depending on whether time data is present). Although, it does open a can of worms wondering about other distance or similarity metrics outside of euclidean and cosine.

Reading #5: $1

Paco

The $1 algorithm is a "cheap trick" to achieve single-stroke gesture recognition in almost any setting. The code is simple enough to write in any language and run on any platform. After all, that was the whole point of the algorithm. Even though it was originally intended for interface prototyping, there have been many applications built using the $1 as the core recognition engine.

$1 is a template matching algorithm. It performs the same pre-processing on training and testing examples:

Resample the stroke to have PPS points
Calculate the centroid and the indicative angle (angle from centroid to 1st point)
Standardize: translate centroid to (0,0), scale to standard bounding box, rotate indicative angle to 0

For testing examples, there is the 4th step of testing against each of the templates, closest match wins.

I really like $1 as a platform to build cool stuff with. People have used it for games. I used it to build an almost-as-simple free sketch recognizer. Its simplicity makes it really easy to customize to your exact needs.

For some reason, it's taken me until now to wonder, what about k-NN instead of 1-NN. I wonder if there's any benefit to that. I would guess you'd need more than the 1-2 examples per class featured on the website version for it to be effective.

Wednesday, September 8, 2010

Reading #4 Sketchpad

George

In this paper, Ivan Sutherland pioneers the (now gigantic) fields of computer graphics and human computer interaction, invents the idea of a sketch as input to external computational applications, and created a library of reusable sketch components. All in a day's work.

I had forgotten that solving for the forces on a bridge was one of the applications with Sketchpad. That sounds an awful lot like another project I've heard of before. I guess I should make sure to cite this paper when I write up Mechanix.

Monday, September 6, 2010

Reading #3 Long

Yue Li

Long introduces Quill, a system to help UI designers pick a good set of gestures. Picking a good set of gestures will reduce human confusion and recognizer confusion, both good things. Quill also allows users to train the recognizer on the fly.

This tool is like a good scope on a rifle. It increases efficiency and ease of use. However, I usually disagree with shooting people. That may be a strong analogy, but I doubt you'll see me advocating the use of gestures in a sketch system.

Reading #2 Rubine

Sampath Jayarathna

This paper is a classic. It begins by explaining GRANDMA, a gesture-based sketching system. I'm not going to talk much about it, because that's not what's important about this work. In order to power GRANDMA, Rubine developed a gesture recognition system by combining 13 intuitive, but creative features with the classic linear discriminator. The real contribution of the work is the set of 13 features.

Rubine's features are really pretty cool. They turn a bunch of data into a handfull of really intuitive descriptors for a stroke. Humans really get "how curvey is", "how square it is", "how long it is", "how fast it was drawn". Apparently, computers really get it, too, because recognition is quite good.

My summary is pretty filled with opinion, so do read that even if you think you know what the paper's all about. To be quite honest, I often cringe a little when I think of Rubine. It's been around for so long, and I've heard it so many times, sometimes it makes me want to shout "JUST MOVE ON ALREADY". But there's a good reason that Rubine comes up so often. It's really simple (if you sort of ignore the math as just part of the algorithm) and it's got a really fast classification speed. I wonder if a significant improvement could be made by applying a newer, non-linear machine learning algorithm to Rubine's features.

Wednesday, September 1, 2010

Hammond Gesture Recognition Survey Chapter

Drew

This chapter surveys gesture recognition techniques. The chapter begins with Rubine's feature-based algorithm, which was one of the first gesture recognition algorithms, especially to achieve such high accuracy. Next, Hammond presents the Long extensions to Rubine's feature set. These two algorithms (both the original and the extended) are commonly used today and usually find good results.

Finally, Hammond introduces Wobbrock's $1 recognition algorithm, which is based on template matching as opposed to feature-based classification. The $1 algorithm is especially notable because it's so simple a small child could implement it, yet the accuracy is comparable to Rubine and Long.

This chapter gives a very good brief survey of gesture recognition techniques. Rubine is basically the seminal paper in the field, and Long is a natural extension. Wobbrock provides a good balance with a really different approach, and demonstrates that an algorithm does not need to be complex to work well. The Electronic Cocktail Napkin might be an appropriate inclusion because it is also vastly different from Rubine/Long or Wobbrock.

In the case of Rubine, the survey chapter has a much clearer focus on the meaning and importance of each feature, and gives less importance to GRANDMA, which has not aged as well as the core algorithm. After reading such a good, thorough overview of the algorithm, I'm questioning why we're reading the original paper, too. In the case of $1, the original paper is just as clear, just as focused, and more detailed. I would rather have just read that.

Since I'm simple minded and easily distracted, Aaron's notes and the missing figures were distracting.

Quest yon air

martin.field@gmail
I am a 2nd year PhD student.
I was told I would learn how to cure my terminal disease.
I've taken "this class" as offered by Christine Alvarado at Harvey Mudd. I've spent a year and two summers doing research in sketch recognition.
In 10 years, I expect to be teaching a class very much like this one.
I think the next biggest technological advancement in computer science will be sliced breadboards.
My favorite course as an undergraduate was Algorithms, mostly due to the professor. Ask Drew I must have told him 100 times how awesome the professor was.
No moves seem to have left a lasting impression on me. Recently, the best movie I've seen is Inception. Walking around in people's dreams interacting with their subconscious was a pretty cool idea. I also think it would be really, really awesome to be able to architect entire worlds in such detail. I'll even forgive the poor scientific execution (if dream time is 20x slower, why do we hear the music at full speed? A: because it would sound awful at 20x and unimaginably bad at 400x slowdown).
If I could go back in time, I guess I'd meet Jean Sibelius. I really like his music, but more importantly, the Finnish tend to have really cool accents.
I'm from California. Yes, that means I hug trees, hate wearing shoes, drink lots of wine, and look just like a character from Baywatch.

Sketch Recognition Fall 2010