Sunday, December 12, 2010

Reading #30: Tahuti

  • None yet
  • Hammond and Davis introduce Tahuti, a geometric recognition system for UML diagrams. Among other things, Tahuti can recognize a number of different arrows and arrowheads that are part of the UML domain.

    Since I find UML to be an unnecessarily complicated and unnatural representation of a software system, I don't really care for Tahuti either.

    Reading #29: Scratch Input

  • None yet
  • Scratch input is a system that does simple gesture recognition using sound data. The authors use a sensor made out of a modified stethoscope to turn any surface into a gesture input area. Users can scratch the surface with a fingernail or some other type of stylus. Only a couple gestures are supported currently.

    Using only one sensor, Scratch input is very limited in its ability to distinguish between gestures. With two or three sensors, the authors should be able to turn the sound data into a real sketch.

    Reading #28: iCanDraw?

  • None yet
  • Paulson et al. introduce iCanDraw?, a system for teaching users to draw a realistic human face. The system uses computer vision techniques to automatically create a template from any image of a person's face.

    This paper does a good job of presenting "errors" to the user, and teaching the user how to correct them.

    Reading #27: KSketch

  • None yet
  • Davis et al. present a sketch-based system for creating Powerpoint-like animations. Users sketch objects and then create animations for those objects, such as translation, scaling, rotation, etc.

    Since KSketch is implemented in C#, it would be cool to see it integrated into Powerpoint, as an alternative, since the authors show that it is preferred to the methods provided by Powerpoint.

    Reading #26: Picturephone

    This reading has been marked as a duplicate of reading #24.

    Reading #25: Descriptor for Image Retrieval

  • None yet
  • Eitz et al. present an image descriptor that works both on a sketch (line drawing) and a processed image where the edges have been enhanced and isolated. The descriptor can be used to do sketch-based search over a large database of images.

    Merging the sketch and image worlds is a pretty cool idea. If I know vaguely what an image looks like (or what I want an image of), then I should be able to draw a quick sketch of that image and get a real image as the result.

    Reading #24: Games for Sketch Data Collection

  • None yet
  • Johnson and Plimmer present several games that can be used to collect sketch data from users. In most games, sketches are created based on a textual description; participants are encouraged to make good sketches because that's how they will win the games.

    I wish SOUSA was this fun.

    Reading #23: InkSeine

  • None yet
  • Hinkley et al. introduce InkSeine, a system for bring non-sketch items into sketches. While a user is taking some notes, she can use the in situ search to bring in external items, all with the pen in a natural way.

    InkSeine seems like a good way to pair notetaking with research, and it's nice not to have to switch back and forth from pen to keyboard to do both tasks.

    Reading #22: Plushie

  • None yet
  • Plushie is a system for turning a Teddy (Reading #21) model into designs that can be printed out to make a real-life plush toy that looks just like the model. The Teddy interface is augmented with some operations that will help the user more easily sew the pattern together.

    This is also a neat system, though my interest in 3D sketching and plush-toy creation is pretty limited.

    Reading #21: Teddy

  • None yet
  • Igarashi introduces Teddy, a system for turning 2D sketches into 3D models of plush toys. Teddy also introduces some interaction techniques for editing the models.

    Teddy is a fun system to play with, although it's kind of frustrating if you're as bad of an artist as I am.

    Reading #20: Mathpad2

  • None yet
  • Mathpad2 is a cool interactive system for students to be able to visualize and work on math problems. The recognition is largely handled by Microsoft, but LaViola and Zeleznik introduce a cool trick for grouping characters in a sketch and also some neat interaction gestures (like the tapping).

    After working so much on Mechanix, I feel like I can really relate to Mathpad2, a system with similar goals.

    Reading #19: Conditional Random Fields

  • None yet
  • Qi et al. present a grouping approach based on Conditional Random Fields. Conditional random fields are cool because they can take into account both local features of a stroke and also the stroke's interactions with the other strokes nearby.

    There's a great payoff for learning it all, but this math is just so complicated and I just can't stay focused with so many equations all over the place.

    Reading #18: Spatial Recognition and Grouping

  • None yet
  • Shilman and Viola present an application of AdaBoost to simultaneously approach the grouping and recognition problems. The idea is simple: a group of strokes belong together if it can be recognized as something.

    It's nice to see an approach that doesn't just assume the grouping problem will be solved by someone else. There are so many symbol recognition papers that make such an assumption, but not very many papers that actually provide any contribution toward solving that assumption.

    Reading #17: Distinguishing Text from Graphics

  • None yet
  • This paper presents a couple of HMM-based approaches for distinguishing text from graphics. The recognizer presented is probably similar to the one implemented by Microsoft, as described in the Patel et al. paper before.

    This paper does a good job of discussing some of the challenges of text vs shape classification, especially the heavy bias toward text in most of the data that the authors collected.

    Reading #16: Graph based symbol recognizer

  • None yet
  • Lee et al. present a graph-based system for combining primitives into symbols and matching them to templates. They investigate four approaches to computing the best matching between primitives in the symbol to be recognized and the primitives in the template.

    This paper would probably benefit from, well, taking a class like Introduction to Search 101. Stochastic Search is like a super-weakened version of simulated annealing. Error-based search is just a heuristic search, but doesn't use any framework/structure for heuristic searches like A*. The authors might consider a real implementation of simulated annealing, or maybe beefing up Greedy search into a beam search.

    There's a polynomial-time exact solution to the matching problem. It's often slower than search techniques, but it does provide a much better solution.

    Reading #15: Image-Based Trainable Symbol Recognizer

  • None yet
  • Kara and Stahovich present a vision-based trainable symbol recognizer. The recognizer is scale, translation, and rotation invariant, and runs very quickly. The system is an instance-based classifier, so it is easy to add new classes or new training instances of already-defined classes.

    This paper presents a good symbol recognizer that's pretty easy to implement. It's also a good introduction to a number of interesting distance metrics.

    Reading #14: Entropy

  • None yet
  • Bhat presents a system for distinguishing between text and shape strokes based on the entropy of the strokes. Strokes are translated into a string of characters representing which angle the stroke is pointed. Letters are added as the stroke changes direction, and also special markers for the endpoints. Then, a simple definition of zero-order entropy is used with those strings. Bhat also uses the gzip library to provide a higher-order definition of entropy.

    Bhat further improves on the accuracy of the Patel et al. system from Reading #13.

    Reading #13: Ink Features for Diagram Recognition

  • None yet
  • Patel et al. present a number of features to use discriminating between text and shape. They use a decision tree with the features. Similar to other approaches, the accuracy is biased toward text strokes, but the Patel system improves on previous systems.

    The Patel et al. system is very simple and computationally efficient, yet it performs the best out of all systems compared. I would have liked to see more discussion of why the authors chose to use rpart instead of other decision tree algorithms like C4.5 or algorithms specifically intended for feature subset selection.

    Monday, October 11, 2010

    Reading #12: Constellation models

  • None yet
  • Sharon describes a system of Constellation Models for Sketch Recognition. Constellation models describe sketches by the relationships between various objects. The objects themselves are recognized using very simple features: the bounding box position, diagonal length, and angle. Interactions between objects are also described with very simple features: the distance between the centers of the strokes, the shortest distance between any two endpoints and the shortest distance between any two points.

    The feature vectors are used in a ML search through the space of possible labellings to find the assignment of labels to strokes that maximizes the probability of the labeling being correct given the training data. The search is broken into two steps: required components and then optional components, which drastically improves the search time. Separating the search into two steps drastically reduces the search space, by reducing the potential number of labels and also the number of candidate strokes.

    I think this is a really neat approach. With respect to the current assignment, it would be pretty cool to break course of action diagrams into components like this. For example, to identify the echelon modifier and the strength modifier.

    This approach is also somewhat similar to LADDER in that it is very important where objects are in relation to the other objects in the sketch. For example, a right eye will never be on the wrong side of the left eye. These constraints are not modeled explicitly, but are learned from the training data. The constellation model only understands positional relationships, as opposed to LADDER, which can model more complex relationships like parallel-ness.

    Reading #11 LADDER

  • None yet
  • In this paper, Tracy unleashes LADDER unto the world. LADDER is a sketching language, designed to describe sketches as combinations of geometric primitives following geometric constraints. A primitive shape is one that cannot be broken down into simpler shapes, for example a line- or arc-segment. Geometric constraints define how shapes interact with the world and with each other. For example, a constraint might describe one line as horizontal or two lines as perpendicular.

    LADDER builds complex shapes up from combinations of simpler shapes, simultaneously tackling the grouping and recognition problems. The downside to this approach is that putting a large number of primitives on the screen will cause an exponential runtime explosion, which has been known to injure innocent bystanders with recursively-enumerable shrapnel.

    Tracy notes that LADDER works well for shapes that can be described in terms of geometric primitives, especially shapes that don't have too much detail. How, then, should we deal with detailed shapes? This is one area of consideration for the current assignment. Is there a general way of approaching detailed shapes, or does each shape require custom-tailored recognition techniques?

    Reading #10: Herot

  • None yet
  • Christopher Herot implements the Sezgin/Stahovich corner finder more than 25 years before they do. Or did he? I see a citation by Negroponte and Taggart in '71 called HUNCH-an experiment in sketch recognition. Maybe it really was more like 30 years before and not necessarily Herot.

    In my estimation, Herot's real contribution in this paper is the exploration of context and user-guided recognition. Herot discusses training the corner finder to be parameterized for a single user. For example, if the user draws a square and tells STRAIN that she has done so, then STRAIN can be adjusted until it finds exactly five corners (including both endpoints).

    Herot talks about representing sketches in a context-free structure. This reminds me of LADDER, whose Lisp-like language can probably be parsed by a simple push-down automata. Okay, that's just a fancy way of saying that HUNCH and LADDER consider the context of a sketch in very similar ways.

    Sunday, September 26, 2010

    Reading #9: Paleo

  • None yet
  • Paulson introduces PaleoSketch, a system for recognizing primitive shapes. A primitive shape is a simple shape drawn with one stroke, many of which cannot easily be broken down into simpler shapes. Paleo uses lots of rules to classify shapes as one of:

    • Line
    • Polyline
    • Arc
    • Circle
    • Ellipse
    • Helix
    • Spiral
    • Curve
    Paleo recognizes more shapes than the previous work (Sezgin) and also features a much higher accuracy.

    PaleoSketch is an important tool, because we have it now and it does a pretty decent job of solving an important problem. That being said, I don't like it one bit.

    Paleo (as described in the paper) has 26 parameters. 26! That's a lot. And they're all "empirically determined", which means hand-tuned. If you add a new primitive shape, do you have to re-tune all 26 parameters plus the new ones you're bound to introduce if you extend the approach? If you turn off some primitives, are you working with an unoptimized parameter set? If the parameters aren't learned, there's no way for the end user to retrain them.

    Next we come to the hierarchy. I have so many questions here. Is it overfit to the particular training case (single primitive shapes drawn in isolation)? I wrote a new primitive fit, okay, now where do I put it in the hierarchy? Or maybe I turned off some primitive fits, is there a better hierarchy for my new set? Again, how can I customize this to my particular needs without some painstaking, manual process?

    Now the reason for the hierarchy: a line fit is not comparable to an arc fit is not comparable to a helix fit. If you can't compare fits to each other, can you possibly give me a reasonable estimate of a confidence for the fit? Let's consider the curvey polyline case, maybe I want to know it's a curve that's also pretty darn close to a polyline. If all Paleo can tell me is that it's more curve than polyline, how do I know if polyline is even a reasonable runner-up? Until you can boast 100% always, I'd really like to have a reasonable estimate of your confidence (and if you have that for all your fits, it would seem they're comparable to each other).

    Sure, it's obvious that Paleo runs in real time. But how much time does it leave for a high-level recognizer to also run and still call it "real time"? Calculating individual fits certainly seems like an extensible approach, but putting all of the features through a classifier (linear, quadratic, SVM, neural net, decision tree, whatever) would probably be faster. I think the extensibility argument is moot given the implications of parameter tuning and fitting new fits into the hierarchy. All that's left is to wonder which does a better job, accuracy-wise.

    Monday, September 20, 2010

    Reading #7: Sezgin Early Stroke Processing

    This is a seminal work in corner finding. I've heard lots about it from Aaron over the course of his various talks. Sezgin combines speed and curvature data to filter through a set of potential corners. The segments are then fit to lines or bezier segments.

    Sezgin also discusses some beautification techniques.

    • Constraint satisfaction to make ends meet and parallel slopes parallel, etc.
    • Replacing recognized strokes with a predefined symbol

    I like the use of bezier segments. Most segmentation papers either restrict sketches to polylines or use arc segments. I've never liked arc segments because they aren't descriptive. My solution would be ellipse segments, which are more configurable. The drawback is that a straight line can match the edge of a really long, flat ellipse. I think it would be pretty simple to constrain the ellipses to prevent this kind of action, though.

    Bezier segments, on the other hand, are 3rd degree polynomials that can pretty well approximate most strokes. For example, the Microsoft ink libraries does some bezier approximation of strokes to smooth them out and it looks really nice. My worry would be that bezier curves are too descriptive and can approximate something complex enough that it really should be segmented.

    "correct" isn't really defined in the evaluation section. 96% is really good, and the cynic in my is asking what's up? why isn't the system in this paper the de facto standard for corner finding? why didn't the reviewers ask how they defined "correct"? clarify, please, so i don't have to doubt you. until then, my assumption is that you're using statistics to lie. it may be unintentional, but i still am very turned off by it. opaque is bad. transparent is good. (can you tell that i like OSS?)

    Reading #8: $N

    $N is a natural extension of the $1 approach to multi-stroke symbols. It lifts restrictions on (or removes the ability to specify) stroke order and direction. This moves it closer to a free-sketch symbol recognizer. It has the possibility to also lift the stroke number restriction, though that can be kept on if desired (for speed, or extra constraint).

    Very much in the spirit of $1, $N is very simple. The templates are very similar to $1 templates, all 1 "stroke" and the same PPS points per stroke for every template. Matching two templates is also the same. Instead of figuring out how to put together N strokes into 1, $N just tries all possible combinations of stroke order and direction. Simple, thorough.

    I think anyone who makes a multi-stroke version of $1 is automatically a pretty cool person. I mean, it really was just asking for it. $N is a good effort. Really, really in the spirit of $1. Super duper simple.

    I disagree with some of the design choices. Including the "in the air" portions of the symbol introduces ambiguity (x vs alpha (the alpha glyph for this font looks terrible, sorry)) and dilutes the data (longer strokes means more space between points when resampled). Having complexity based on the number of strokes is a tough battle. Having O(N!*2^N) complexity in the number of strokes is an even tougher battle. 5 strokes means >3000 templates, 6 strokes goes up to >46000, 7 strokes is >645000. As stated, this limits the expressivity of the template set.

    That being said, no one is claiming $N is some sort of silver bullet. In fact, the authors are very upfront about the limitations of $N.

    Monday, September 13, 2010

    Reading #6: Protractor

    Protractor applies calculus to $1 to get rid of the rotation search. It also uses the cosine similarity metric instead of the euclidean distance metric. These small differences give it a significant advantage when tested on a large dataset of latin characters drawn with fingers on a mobile device. Finally, Protractor is able to achieve the same or better accuracy compared to $1 when using 16 PPS instead of 64. This helps reduce computation time and memory use.

    I mentioned before that I liked $1 as a platform to build stuff off of. Here's another good example of customizing the base algorithm to add something new like rotation sensitivity.

    I would like to see a sensitivity analysis on the points per shape for both $1 and Protractor. Until I see that, I don't think it's quite fair to say that Protractor requires 1/4 the memory of $1, since you could just use 16 PPS as the constant in the original $1 algorithm, too.

    Using the cosine distance is clever, I've never really thought about considering stroke data as an n-dimensional vector before, but rather as a collection of 2- or 3-dimensional vectors (depending on whether time data is present). Although, it does open a can of worms wondering about other distance or similarity metrics outside of euclidean and cosine.

    Reading #5: $1

    The $1 algorithm is a "cheap trick" to achieve single-stroke gesture recognition in almost any setting. The code is simple enough to write in any language and run on any platform. After all, that was the whole point of the algorithm. Even though it was originally intended for interface prototyping, there have been many applications built using the $1 as the core recognition engine.

    $1 is a template matching algorithm. It performs the same pre-processing on training and testing examples:

    1. Resample the stroke to have PPS points
    2. Calculate the centroid and the indicative angle (angle from centroid to 1st point)
    3. Standardize: translate centroid to (0,0), scale to standard bounding box, rotate indicative angle to 0
    For testing examples, there is the 4th step of testing against each of the templates, closest match wins.

    I really like $1 as a platform to build cool stuff with. People have used it for games. I used it to build an almost-as-simple free sketch recognizer. Its simplicity makes it really easy to customize to your exact needs.

    For some reason, it's taken me until now to wonder, what about k-NN instead of 1-NN. I wonder if there's any benefit to that. I would guess you'd need more than the 1-2 examples per class featured on the website version for it to be effective.

    Wednesday, September 8, 2010

    Reading #4 Sketchpad

    In this paper, Ivan Sutherland pioneers the (now gigantic) fields of computer graphics and human computer interaction, invents the idea of a sketch as input to external computational applications, and created a library of reusable sketch components. All in a day's work.

    I had forgotten that solving for the forces on a bridge was one of the applications with Sketchpad. That sounds an awful lot like another project I've heard of before. I guess I should make sure to cite this paper when I write up Mechanix.

    Monday, September 6, 2010

    Reading #3 Long

    Long introduces Quill, a system to help UI designers pick a good set of gestures. Picking a good set of gestures will reduce human confusion and recognizer confusion, both good things. Quill also allows users to train the recognizer on the fly.

    This tool is like a good scope on a rifle. It increases efficiency and ease of use. However, I usually disagree with shooting people. That may be a strong analogy, but I doubt you'll see me advocating the use of gestures in a sketch system.

    Reading #2 Rubine

    This paper is a classic. It begins by explaining GRANDMA, a gesture-based sketching system. I'm not going to talk much about it, because that's not what's important about this work. In order to power GRANDMA, Rubine developed a gesture recognition system by combining 13 intuitive, but creative features with the classic linear discriminator. The real contribution of the work is the set of 13 features.

    Rubine's features are really pretty cool. They turn a bunch of data into a handfull of really intuitive descriptors for a stroke. Humans really get "how curvey is", "how square it is", "how long it is", "how fast it was drawn". Apparently, computers really get it, too, because recognition is quite good.

    My summary is pretty filled with opinion, so do read that even if you think you know what the paper's all about. To be quite honest, I often cringe a little when I think of Rubine. It's been around for so long, and I've heard it so many times, sometimes it makes me want to shout "JUST MOVE ON ALREADY". But there's a good reason that Rubine comes up so often. It's really simple (if you sort of ignore the math as just part of the algorithm) and it's got a really fast classification speed. I wonder if a significant improvement could be made by applying a newer, non-linear machine learning algorithm to Rubine's features.

    Wednesday, September 1, 2010

    Hammond Gesture Recognition Survey Chapter

    This chapter surveys gesture recognition techniques. The chapter begins with Rubine's feature-based algorithm, which was one of the first gesture recognition algorithms, especially to achieve such high accuracy. Next, Hammond presents the Long extensions to Rubine's feature set. These two algorithms (both the original and the extended) are commonly used today and usually find good results.

    Finally, Hammond introduces Wobbrock's $1 recognition algorithm, which is based on template matching as opposed to feature-based classification. The $1 algorithm is especially notable because it's so simple a small child could implement it, yet the accuracy is comparable to Rubine and Long.

    This chapter gives a very good brief survey of gesture recognition techniques. Rubine is basically the seminal paper in the field, and Long is a natural extension. Wobbrock provides a good balance with a really different approach, and demonstrates that an algorithm does not need to be complex to work well. The Electronic Cocktail Napkin might be an appropriate inclusion because it is also vastly different from Rubine/Long or Wobbrock.

    In the case of Rubine, the survey chapter has a much clearer focus on the meaning and importance of each feature, and gives less importance to GRANDMA, which has not aged as well as the core algorithm. After reading such a good, thorough overview of the algorithm, I'm questioning why we're reading the original paper, too. In the case of $1, the original paper is just as clear, just as focused, and more detailed. I would rather have just read that.

    Since I'm simple minded and easily distracted, Aaron's notes and the missing figures were distracting.

    Quest yon air

    1. martin.field@gmail
    2. I am a 2nd year PhD student.
    3. I was told I would learn how to cure my terminal disease.
    4. I've taken "this class" as offered by Christine Alvarado at Harvey Mudd. I've spent a year and two summers doing research in sketch recognition.
    5. In 10 years, I expect to be teaching a class very much like this one.
    6. I think the next biggest technological advancement in computer science will be sliced breadboards.
    7. My favorite course as an undergraduate was Algorithms, mostly due to the professor. Ask Drew I must have told him 100 times how awesome the professor was.
    8. No moves seem to have left a lasting impression on me. Recently, the best movie I've seen is Inception. Walking around in people's dreams interacting with their subconscious was a pretty cool idea.  I also think it would be really, really awesome to be able to architect entire worlds in such detail. I'll even forgive the poor scientific execution (if dream time is 20x slower, why do we hear the music at full speed? A: because it would sound awful at 20x and unimaginably bad at 400x slowdown).
    9. If I could go back in time, I guess I'd meet Jean Sibelius. I really like his music, but more importantly, the Finnish tend to have really cool accents.
    10. I'm from California. Yes, that means I hug trees, hate wearing shoes, drink lots of wine, and look just like a character from Baywatch.
    You need a better browser to enjoy this page.
    hide

    Draw on my Blog!

    Left click anywhere to start drawing. Right click to attempt recognition and to clear the screen. Recognition is powered by the $N recognizer. Contact me if you want your blog listed; for now please pick a symbol from that web page.

    Legend