evaluation papers

There are many questions around evaluating the results from procedural graphics. I while back I wrote up the following notes about a couple of evaluation papers from recent graphics journals.

How Well Do Line Drawings Depict Shape? ( pdf | web | acm | 2009 )
Forrester Cole, Kevin Sanik, Doug DeCarlo, Adam Finkelstein, Thomas Funkhouser, Szymon Rusinkiewicz, Manish Singh

This paper is the reverse of last year's attempt - it asks a sample of people to guess the curvature of a set of objects from line drawings.

however there is little scientific evidence in the literature for these observations. Moreover, while a recent thread of the computer graphics literature devoted to the automatic algorithms for line drawings is flourishing, to date researchers have had no objective way to evaluate the effectiveness of such algorithms in depicting shape.

There are a bunch of different ways to generate contours for an image:
  • suggestive contours - marginal occluding contours (appear with small POV change)
  • ridges and valleys - local maxi-minima of surface curves
  • apparent ridges - view dependent ridges and valleys

Bonus points for mentioning Team Fortress 2 and using NPR (non photorealistic rendering).

Psychophysics defines a gauge figure - a 2d representation of a thumbtack with the spike giving the normal direction - there's a body of previous work of asking people to guess normals from outlines etc... These are the little brown dots in the above image.

The study then involved asking 560 people to position 275,000 gauges on different line drawings, to get an idea of the perceived surface direction.

An interesting idea here is the use of Amazon's Mechanical Turk to get lots of people to participate in the activity, in return for a micropayment. Basically there is very little control over the environment in which someone encounters the game -
  • there may be someone at the next desk playing the game, so you might be able to sneak a look at other representations of the same object
  • there might be all sort of distractions (cat walking on the keyboard)
  • visual display will not be calibrated in the same way (different gamma curves may lead to different perceptions of normals?)
However, these situations are quite rare and more than likely compensated for by the large volume of data produced. It's kind of exciting that this method of data collection is considered scientific enough for publication.

The study compensates for the "bas-relief" effect (assumed projection by the viewer) by scaling the results to a standard depth.

The results look solid - more ambiguous figures had a higher deviation from the median surface normal.

The occluding contour technique was worse than the others, with the other line drawing techniques being about equal. Comfortingly a 3D rendered/shaded image gave better results than all the line drawings.

A more detailed analysis of contentious areas of the models was performed and gave the interesting conclusion that sometimes and artist's representation was more reliable (the above computer generated image of a screwdriver has dark brown gauges showing ambiguity in the surface direction), but sometimes they over did it and gave the wrong impression. In these cases the computer generated outlines were better.

Wilcoxon/Mann-Whitney is a non-parametric test for comparing distributions that I hadn't stumbled across before.

Toward Evaluating Lighting Design Interface Paradigms for Novice Users ( pdf | acm | 2009 )
William B. Kerr, Fabio Pellacini

It turns out there are a few interfaces for defining lighting in a 3d scene:
  • direct (Maya/3DSMax style) - drag lights around
  • indirect - drag the shadows, edge of lights, and their results around
  • painting - a constraint solution that optimizes the light's position to make the scene look most like a sketch you just coloured in (above, right two images)
This paper evaulates these different techniques, as used by untrained people. They ask them to match the lighting in an example (same geometry, rendering etc...) and in a still frame from a movie onto simple geometry.

20 people are used, with 5 hours use per person (include warm up time...). The error differences between the rendered scenes are compared for the exact-match trial, while the movie-scene sections are evaluated in a more subjective manner. The results are also recorded over time, allowing the authors to graph how long each technique takes to converge.

They also do a questionnaire:
  1. Natural way to think about lighting
  2. ease of determining what to modify (selection)
  3. how it should be modified (intention)
  4. carryout the adjustment
  5. preference on few lights
  6. many lights
  7. preference in matching trials (goal image)
  8. open trials (no goal image)
  9. overall preference.
The general results was that the newer technique, painting doesn't do so well, and that direct and indirect illumination are better. Between direct and indirect, indirect gets to the result fastest.

The two statistical tools used are