SICSA Demofest 2009 video/poster

Catch a bit of (2 year old) hyperbole from me at around 39 seconds (on 'tube):

The poster I presented is below (pdf) - it is an attempt to put a commercial spin on the work that shows the future promise of the work (with about GBP 10^7 in funding)


Recursive weighted skeleton

Given the weighted straight skeleton, and two features
  • The ability to change the speed of an edge at a certain time (height/offset depending on your model) (lets call this an edge direction event)
  • The ability to add edges on a particular edge at a certain time (lets call this a plan edit)
We can devise a language that allows the plan edits after a sequence of edge direction events. The question is - can this language be recursive? Turns out the answer is yes:

Here's another more discrete variant.

This is interesting because of it's similarity to two different types of automata - fractals (such as the Koch snowflake) and cellular automata (such as the game of life). Because the speed of edges can be forward or backward we can create edge direction events to model growth and decay - this is somewhat like the game of life. And because our language allows recursion we can model fractal shapes...

I get the feeling that system has the persistence and transmission of information to create high (Wolfram) classes of behaviour according to Wolfram. Will need a better implementation to go hunting for them...


ssd's and ubuntu

So I got fed up with waiting for my new computer to boot or java to find the hundreds of files it needs, I decided to blow my spoils from a recent conference victory on a SSD hard disk for my desktop machine.

First surprise - it came in a flat envelope - thought it was a mistake at first, and they'd sent me an OEM cd, but sure enough it was an Intel 80Gb (SSDSA2MH080G2C1 to be precise...) . These things are really small, packaged in not much and must be quite robust.

Second surprise - well not really surprise, the speed boost is pretty noticeable. Overclocking takes second place to getting one of these on your development machine - top video is with my old HDD (320Gb WD Caviar drive), bottom with the Intel SSD.

So running ubuntu, there are two things I discovered people talked about - erase block alignment and trim support. Because I'll forget what I did this time, I'll note it in this blog, however I suspect that by the time anyone uses this, there will be a better guide available.

Erase block alignment

As explained here, when a SSD deletes data, but nuking entire erase blocks. On my intel disk disc drive the erase block size is 128K (128 * 2^10 bytes, different drives, such as the OCZ use different e/block sizes), so we want the sectors of the drive to align to this boundary. I used the instructions here to use the live Ubuntu CD and fdisk to ensure this was the case.
Then I reinstalled ubuntu and booted her up.

Trim support

Trim is an operating system function that tells a drive when pages are no longer in use, and can be handed back to the SSD's page-allocation system. But the default version of Ubuntu (currently lucid/10.04) uses kernel version 2.6.32, so there's no trim support. The instructions here give a painless way to upgrade the kernel to 2.6.34, once ubuntu is installed.




weighted straight skeleton source release

The straight skeleton shrinks in the boundary of a polygon, whilst tracing out the corners to create a roof-like structure. Thus is quite useful for building things like roofs or offset surfaces. I never did find an impelmentation that wasn't CGAL's, so here's a floating point implementation.


The weighted skeleton allows the edges to move at different speeds. You could even assign negative weights to edges.

Instructions: (java webstart, binary code, run at your own risk).
  • Left drag the little orange squares at the corners of the cross
  • Right drag to move the view
  • To change the weights drag the big orange circles
  • Control-click to add corners
  • Shift-click to remove corners
  • There are known bugs (for example the polygons in the UI overlap, there will be cases where it doesn't tessellate).
  • Use alt-while-dragging to snap to grid
Google code project (java) is here.

Here's an example of how to skeleton a simple triangle. Points are defined counter clockwise. If you only want the 2d projection, discard the final (z) co-ordinate of the output-faces.

        c1 = new Corner ( 0,0),
        c2 = new Corner (100,-100 ),
        c3 = new Corner (100,0 );
       Machine directionMachine = new Machine ();
       loop1.append(new Edge ( c1,c2 ) );
       loop1.append(new Edge ( c2, c3 ) );
       loop1.append(new Edge ( c3, c1 ) );
       for (Edge e : loop1)
        e.machine = directionMachine;
       Skeleton skel = new Skeleton (out, true);
       for ( Face face : skel.output.faces.values() )
           for (Loop  lp3 : face.points)
            for (Point3d pt : lp3)


copyright vs procedural content

In 2006 the Damien Hurst was accused (via) of copying a design ("valium", top) from Robert Dixon ("true daisy", bottom). Hurst is a quite well known millionaire (valium's 500 prints sold for "four and five figure sums") and Dixon a relatively unknown computer/geometric artist. At first glance it looks like Hurst just coloured in, flipped and cropped Dixon's work, but looking a bit deeper it gets more interesting.

Dixon's work is based on the patterns of spiral Phyllotaxis, a natural phenomena that describes (among other things) the arrangement of needles on a pinecone and pattern of seeds on a sunflower head.

IMGP0618 - cu - sunflower head
by RaeA

This is a old mathematical pattern that can be expressed as an formula, and was well known enough to be commented on by da Vinci. Because of this Dixon was able to write code (similar to mine, that produced my diagrams below) to generate his pattern.

Simply, Phyllotaxis is a spiral pattern of points. Although many different spacings are possible, the fantastic thing is that nature always chooses the same spacing. My diagram below shows a few possible spacings, and one jumps out as being "correct" as it covers the space most evenly. This is the pattern that is used by nature, Dixon and Hurst.


But the question is - should he be able to claim copyright on it? and is Hurst's work derivative of it? Copyright law say yes - Dixon has some claim over this figure, as he created it (no matter it's source) and has a copyright claim. The law just asks that he has the work is "original, and exhibits a degree of labour, skill or judgement". Maybe he was the first to use circles of that diameter to represent the pattern. So if I can build a general machine that with some parameter imitates Dixon's machine - does Dixon own the output of that machine?

Okay, so I built the above machine after viewing Dixon's work, at some point it probably outputs something that looks enough like Dixon's work. This is a real headache for me as a programmer - is the output of my program derivative? does Dixon own the copyright to the program he's never seen?

One answer is that I should block particular frames that contain copyrighted images. To do this in this single instance, I would have to describe Dixon's work in the program to stop it infringing. The program would then contain copyrighted material and be illegal. To do it in the general instance, for all copyrighted spiral patterns, I would need to do an international prior-art search, which would cost vastly more than the two hours it took to create the above video.

There are excuses for copyright, for example if you can prove you didn't know about the existing image, that you didn't "copy" it, then the use is mostly covered under fair deal (or fair use in the USofA). But some inventions are covered by the more powerful patents - in this case even if you come up with the idea independently yourself, the first person to be awarded the pattern hold all rights to it.

In situations like this, Intellectual Property (IP) law feels very wrong, this is an algorithmic pattern, and if I'm allowed to copyright it, or copyright a video of all possible versions of it, there is nothing left for the rest of the world to use.

If I can write a program that recreates your patented "invention" or copyrighted "original" object do you own that program? If I can write a program that will write that program, given a certain command (abracadabra!), do you own that word? do you own/control the first program and the second program? are both programs illegal (under DMCA-like legislation they probably are). Does this mean that any program that can recreate patented work is illegal? does this include a computer compiler (probably not)?
At the moment these ideas are unlikely, and limited to toy examples. But in ten years, when anyone can ask the computer to create a program that lets you edit photos ("now I want a circular brush!"), and that program infringes on hundreds of patents, it seems like IP law will have to change.

Another reason to burst the IP bubble sooner rather than later? A previous post on copyright can be found here.


rapid prototyping procedural buildings

via boing boing.

These guys have a set of templates you can send off to a rapid prototyper (or yourself) to build a kit-set for your own cathedral. You choose which sections you want printed

Then glue them together yourself -

This is what procedural architecture is all about - what they have here is a simple shape grammar! We're not far from the situation of having designs where you can define an arbitrary floor plan and style on a website, and download a solid file you can get printed. Or how about having it printed big enough to be a dog house? or somewhere to live in the summer? I can't wait for the first lorry to arrive with a procedurally generated building on the back :)


exploring frequency responses for focus

[ pdf | code ]

In the name of badly rushed science for conference deadlines I present the accumulation of a couple of week's evenings messing around learning about frequency space (pdf).

[tl;dr] This paper got accepted, and I "presented" it at the Sicsa 2010 conference. Here's the video:

While I'd always seen the little bars on music amplifiers, I'd never thought of images being represented in the same way. The bars represent the frequencies being played back at any one time. The low frequencies (slower moving bars, normally to the right) are the deep sounds, and the high frequencies (fast bars on the left) are the high sounds. It turns out they have a nice analogue in the image plane, but because we don't look at at every pixel in a photo in order from start to end over 3 minutes we never see them.

If we identify the important areas of an image for each frequency (DoG pyramid/"monolith"), we can animate over the frequency (high frequencies first, then the low ones):

We can then see that a single point in the image has different intensities at different frequencies, as the shade of grey at a point changes. So there's one of the little bar-graphs for each pixel.

I built a little application that lets you see these graphs and the spatial frequencies in an image. It's quite fun to play with, you can start it by clicking here (java webstart, binary file, run at your own risk, source code below). Wait for it to load the preview images, select one, wait for it to build the map (lots of waiting here...) and then use the right-drag to move around, wheel to scroll, and the left button to show a frequency preview for a particular location. Move the mouse out of the frame to see the representative frequency map from the work.

As you drag the point around the lil bars change to show you what the frequency content is like in that area of the image.


This was neat, and I had to do something with it, so I built a hoodicky that takes a single image and recovers the depth map from the focus of a single image. I assume that stuff in focus is near the camera, and stuff out of focus is a long way away - photography just like your Mum used to do. It turns out that not too much work has been done in this region, these guys even got a book article out about it last year.

So just to late to be hot news... but interesting none-the-less. So I decided to twist the concept a bit and use it for blur aware image resizing. The motivation being that you need big (expensive) lenses to take photos with a shallow Depth of Field, but when you resize these images, you loose that depth of field:


In the smaller images, more of the logo appears to be in focus, but it's the same image, just scaled a bit.

So we want something that keeps that proportion of each frequency the same as you scale the image. So basically it's a thing that keeps foreground/background separation when scaling an image. We can use the focus of an image (closely connected to it's frequencies) to determine the depth, as this video shows.

out of the snow

In the following, a, b and d are normal scaling, while c & e use the depth map we've calculated using the frequency map.


This uses the frequencies on the edges of the photo to classify the segmented image. This shows the same kind of thing at the top, and the segment-frequency map bottom left, and the recovered depth map bottom right.


More results (SICSA are the people who pay half my rent...):


Here a,b,c are standard scalings and d,e are from my algorithm.


Results are better than I expected for the few days I spent putting this thing together, but the basic problem is there are two or three parameters that need to be tuned for each image to take into account the noise of the image and the bias towards foreground/background classification. Good working prototype, lots of effort required to do this for real.

The write up looks like this (pdf, src below), I'm a bit certain some of the equations are wrong - but this is computer science, no one actually reads equations do they?

Source code is here. Java. I think it uses Apache licensed stuff, that that's how it should be treated, but I scrounged a fair bit of code, who knows where it's from... Main class should be FrequencyMap, and after setting up an image in that class you'll need to play around with the four constants at the top:
  • fMapThreshhold: Increase if not enough red (high freq) in the freq map, decrease if noise is being counted as red
  • scaleFac: usually okay to leave alone. If you want a frequency map with more levels, decrease this, keeping it above one.
  • filterRadius: noise reduction on edge classification. Increase to classify more edges as higher frequency
  • Q: increase to increase the number of segmentations.
It will write files out to the disk. Use at your own risk.

[edit: 22/3/10]

While out riding this weekend I figured out that it should be possible to analyse the defocused edges for evidence of higher frequencies to determine if the edge is in front of, or behind the in-focus point. More importantly if we can't see any high frequency edges in the bokeh, then it doesn't matter if that part of the edge is in front of, or behind the defocused edge...


[edit:] woo! It's been accepted (probably because it's one of the few graphics papers from Scotland). Reviewers comments follow:

Dear Author,

Thank you for your submission to the SICSA PhD Conference. We are delighted to inform you that, subject to corrections, your paper has been provisionally accepted to be presented at the conference. The comments from reviewers are included at the bottom of this email.

We will be in touch with you soon regarding the nature of the corrections that must be made in order for your paper to be included. You should note that the absolute deadline for these corrections will be MAY 19th 2010. No deadline extensions can be offered.



Paper: 21
Title: Blur Preserving Image Resizing

---------------------------- REVIEW 1 --------------------------
TITLE: Blur Preserving Image Resizing

OVERALL RATING: -1 (weak reject)
Audience Appeal: 4 (A typical CS honours graduate will fully understand what the paper is about, but is unlikely to understand all the details.)
Significance: 4 (Interesting work which will probably be cited even if it does not shake the field of research.)
Originality: 4 (A minor or incremental improvement to existing work that is not immediately obvious.)
Presentation: 4 (The paper makes sense and the story is intelligible. However, the structure or grammar is sub-optimal, potentially making it frustrating to read.)
Validity: 5 (Valid, practical, relevant approaches are made and are probably correct but may not be concrete.)
Recommendation: 3 (The paper is below average, but I do not feel strongly as to whether it should be accepted or rejected.)

Please provide a response to each of the following queries:

1) What was the paper about?

The paper presents a method for scaling images which attempts to retain the image's apparent depth of field by selectively blurring parts of the image. The amount of Gaussian blur applied to each segment is proportional to the depth of the scene at that segment. This depth information is automatically calculated from a single image using an augmented segmentation algorithm.

2) What claims or hypotheses did the paper make (if any)?

The paper claims that:

- When images are scaled down the apparent narrow depth of field is lost.
- It is desirable to keep the same proportion of the image with the same CoC.
- The presented method is able to retain the apparent depth of field in an image as it is scaled down.
- The presented method is capable of narrowing the depth of field apparent in images captured using commodity compact cameras to allow a larger aperature to be simulated.

3) What are the good points of the paper? Why might it be accepted? What did the author do right?

The paper introduces a significant new method which can be useful in practice (as it requires very little information and/or extra equipment). The method has been clearly presented and is understandable. Plenty of images have been provided which do add to the text. The algorithm appears to perform the intended task well.

4) What are the bad points of the paper? Why might it be rejected? How might the author improve their work?

The first claim (that apparent depth of field is lost when images are scaled down) has not been justified. The efficacy of the method at retaining the depth of field has not been quantitatively analysed, nor has it been compared to the bog-standard scaling algorithms or the naive blurring approaches. The method has only been demonstrated to work on a number of hand-chosen examples. The authors make a third claim in passing which has not been demonstrated or backed up in the paper.

The paper should be shortened down to 10 pages. Changes could be made to sentence structure (particularly in the introduction and conclusion) to make the paper more understandable. The relationship between the presented method and previous related work could be explained in more detail.

Please also provide a justification for the scores you have provided in each of the following areas:

Audience Appeal (1 - bad, 6 - good):

The presented method is easy to follow. Only the presentation occasionally gets in the way.

Significance (1 - bad, 6 - good):

Originality (1 - bad, 6 - good):

It is of interest to see how the presented method relates to the existing literature on single-image depth estimation. Whilst the application area is novel, the underlying machinery may not be.

Presentation (1 - bad, 6 - good):

A number of grammatical and spelling errors exist. The paper is too long (14 pages). Some parts of the text feel unfinished (in particular the introduction and conclusion.

Validity (1 - bad, 6 - good):

The authors present a convincing technique, and also point out its potential shortcomings in the conclusion section.

Recommendation (1 - bad, 6 - good):

The paper, in its current form, is not ready for publication. First, the method needs to be analysed and compared quantitatively. Second, the writing, layout and presentation in general should be improved.

Any other comments to make to the authors:

(Changes have been capitalised)

- Grammar: '... large apertures, WHICH has led ...'
- Readability: '... That is the sharp area of focus ...'
- Readability: '... apparent narrow depth of field is lost, figure 1 ...'
- Readability: best to define the Circle of Confusion (CoC) *immediately* after it is introduced in Section 3.
- Readability: the CoC could be explained more clearly in Section 3.1
- Readability: caption for Figure 2
- Style: figure 1, figure 2, etc should be Figure 1, Figure 2, etc
- Style: best not to begin figure captions with labels (a, b, etc)
- Style: there should be a space between citation numbers and preceding text:
'... an all-focused image[2] ...' should be '... an all-focused image [2] ...'
- Typos: extra bracket at the end of the 2nd equation on page 4
- Typos: '... whilst we examine the a larger frequency space.'
- Typos: '... the frequency content of ITS border ...'
- Citations: the first citation (Wikipedia) has not been made correctly

---------------------------- REVIEW 2 --------------------------
TITLE: Blur Preserving Image Resizing

OVERALL RATING: 2 (accept)
Audience Appeal: 5 (Easy to follow and read with minor points remaining unclarified.)
Significance: 4 (Interesting work which will probably be cited even if it does not shake the field of research.)
Originality: 4 (A minor or incremental improvement to existing work that is not immediately obvious.)
Presentation: 5 (The paper is well structured and can be fully understood. However, some sentences are badly or awkwardly written.)
Validity: 4 (Valid, relevant approaches are made, but they are not convincing or may be a little impractical.)
Recommendation: 5 (The paper is good and should be included.)

Please provide a response to each of the following queries:

1) What was the paper about?
This paper describes an image resizing algorithm that improves quality by
maintaining depth perspective.

2) What claims or hypotheses did the paper make (if any)?
This paper identifies that depth perspective is lost when images
are reduced in size resulting in a loss of quality. With a growing
number of collaborative image viewing sites where images are initially
viewed as a thumbnail the loss of depth perspective results in a loss
of quality in the resized images.

An algorithm is proposed for resizing images which maintains depth
perspective. The image is segmented, each segment is classified to a
depth using the frequency and then blurred to match the new size.

3) What are the good points of the paper? Why might it be accepted? What did the author do right?
I enjoyed reading this paper. It is a clear, well written paper with
a good structure that is easy to follow.

The problem being addressed is clearly identified and well described
both in the text and by example. Most readers will have practical
experience of this issue making the paper particularly relevant to
a general audience.

The paper makes good use of numerous examples to explain each stage
of the approach. This made it easier to understand the impact of each
stage of the resizing process even if some of the underlying technical
aspects were, by their nature, more difficult to follow.

4) What are the bad points of the paper? Why might it be rejected? How might the author improve their work?

My biggest issue with this paper is the evaluation. A new algorithm
is proposed with evidence of its effectiveness being left to the
reader by the inclusion of 4 sample images that have been resized.
I would expect to see stronger evidence to support the proposed
approach. While a quantitative evaluation may be difficult a user
evaluation should be possible and could provide evidence both that
the problem is important and that the proposed approach is a
perceived improvement.

The assumption that in focus object is in the foreground appears to
be a very limiting restriction of the approach. Although raised as
an issue in the conclusions, this issue was not discussed in sufficient
detail and merits more discussion on the extent to which this would
limit the use of the algorithm.

The abstract is OK as far as it goes but is too short. It should be
expanded to give, at least, an indication of the methodology adopted.

Please also provide a justification for the scores you have provided in each of the following areas:

Audience Appeal (1 - bad, 6 - good):5
Maintaining image quality on resized images is a subject area that
most of the audience will be able to relate to. This paper is likely
to be understandable and of interest to a wide spectrum of a general
audience. However, some aspects of the paper are understandably quite

Significance (1 - bad, 6 - good):4
The problem being investigated is clearly identified and appears to
be a genuine issue at least conceptually. However, given the lack of
a user evaluation it is difficult to judge the importance of the
problem and the extent to which this work solves it and improves
the quality of image resizing.

Originality (1 - bad, 6 - good):4
The work appears novel. Perhaps not in the broad approach used but at
least in the combination of techniques used and for the specific
application addressed in this paper.

Presentation (1 - bad, 6 - good):5
This paper is well written and has a good structure. Typos and minor issues
are listed below - although very few.

Validity (1 - bad, 6 - good):4
The methodology adopted appears fine and its success is demonstrated by
the inclusion of a number of sample images shown at different sizes.
However it is difficult to quantify the success of the approach on a
small number of images. A larger evaluation would provide more evidence
to support to the approach.

Recommendation (1 - bad, 6 - good):5
Overall a solid piece of work that merits a place in the conference.

Any other comments to make to the authors:

Typos and minor issues:

page2para4line4: "Alternatively[11]" --> "Alternatively [11]" numerous other examples
page2para8line7: "examine the a larger" --> "examine a larger" ?
page5para1line3: "from [9]" --> " from [9]"
page9para4line2: "allow the" --> "allows the"

---------------------------- REVIEW 3 --------------------------
TITLE: Blur Preserving Image Resizing

OVERALL RATING: 2 (accept)
Audience Appeal: 4 (A typical CS honours graduate will fully understand what the paper is about, but is unlikely to understand all the details.)
Significance: 4 (Interesting work which will probably be cited even if it does not shake the field of research.)
Originality: 4 (A minor or incremental improvement to existing work that is not immediately obvious.)
Presentation: 4 (The paper makes sense and the story is intelligible. However, the structure or grammar is sub-optimal, potentially making it frustrating to read.)
Validity: 5 (Valid, practical, relevant approaches are made and are probably correct but may not be concrete.)
Recommendation: 5 (The paper is good and should be included.)

Please provide a response to each of the following queries:

1) What was the paper about?

This paper proposed a new method to extract depth information from a sigle image, which was used to resize the image while the blurriness caused by our-of-focus was maintained.

2) What claims or hypotheses did the paper make (if any)?

The authors claims a technique to calculate depth values for an image.

3) What are the good points of the paper? Why might it be accepted? What did the author do right?

The results of the work are well presented and the related work are clearly stated.

4) What are the bad points of the paper? Why might it be rejected? How might the author improve their work?

Please also provide a justification for the scores you have provided in each of the following areas:

Audience Appeal (1 - bad, 6 - good):

The paper is clearly structured and easy to follow. It may be of interest to researchers in the fields of Computer Vision and Image Processing.

Significance (1 - bad, 6 - good):

The contribution of this work may be considered incremental in nature.

Originality (1 - bad, 6 - good):

The work has some novel aspects including the segmentation of the iso-depth region.

Presentation (1 - bad, 6 - good):

The paper is well presentated and each section is clearly linked.

Validity (1 - bad, 6 - good):

The results showed in the paper are strong evidence to the validity.

Recommendation (1 - bad, 6 - good):

The paper should be accepted.

Any other comments to make to the authors:


procedural Inc are looking for a computer vision expert!

Procedural create CityEngine, one of the few useful procedural urban tools around. Does this mean that they are looking for an easier way to create models of buildings? Reconstruction from street photographs? From aerial photographs? Why not from existing meshes (from video game artists)?

Image-based Street-side City Modeling (pdf) from Sigg09 shows some of the potential of this direction.

I'll be looking forward to seeing what these guys do next.


first year report

Sorry, it's been a little quiet around here recently. One of many reasons is that I put together my first year report (updated pdf). Notes:.
  • [edit] Having taken criticism from blog readers and supervisors alike, I've updated this. Some of the cut bits appear below - if you've already read it, don't bother reading it again, nothing significant has changed.
  • Some of the more interesting results from my time at ASU have been cut for client confidentiality. This is getting quite annoying now, will try not to fall into non-disclosure hole again.
  • It's mainly a bunch of blog posts end on end.
  • It isn't really of the quality required for a thesis.

Bits that didn't make the cut:

Possible Theses

Here I present a few possible directions that I see my work progressing in. Because of the new ground that is being covered, posing these direction as theses is a little artificial at this point.

Procedural Workflow

Useful procedural systems can be constructed by non-technical users
Three dimensional modelling packages such as Wavefront's 3D Studio Max or Maya present artists with very complicated interfaces for the creation of 3D content. These systems include data flows, such as 3DSM's hypergraph which allows different properties of a material to be edited, and construction histories. It should be possible to build a system of similar complexity, that can be used by 3D artists to enable them to build procedural objects. This would involve investigating subjects such as intelligent primitives (such as blobby objects or z-spheres), the generative modelling language (GML), and current 3D workflows.

Reverse proceduralization

Reverse engineering real world or artist created objects allows for effective proceduralization of geometry
From a design perspective reverse proceduralization, or reverse engineering allows artists to use their current tools to create a single instance, and then perform proceduralization as an after-thought. This would give the easiest integration into the current workflows of all proposed procedural solutions. Study in this subject would involve machine learning techniques to extract form from designs, after sufficient pattern-analysis has been performed.

Floating Point Robustness

It is possible to perform robust geometrical computations in a floating point environment.
One of the upshots of my work with the straight skeleton and the Voronoi tessellation is the realisation of how hard it is to work with geometry in a floating point environment. I found that the vast majority of my time was concerned with robustness issues. For example a point that is on one side of a line can appear on the other if the calculation is repeated with an indentical line travelling in the opposite direction.

There are well known limits, within which standard geometric queries fail to behave in a predictable fashion. The result is quite chaotic, and makes designing robust algorithms both time consuming and bug-prone.

There are several attempts to make computational geometry more robust, but I've found none to make it easier. The ideal goal here would be to create an library that an undergraduate could use for robust geometric manipulation, in place of CGAL or similar.


evaluation papers

There are many questions around evaluating the results from procedural graphics. I while back I wrote up the following notes about a couple of evaluation papers from recent graphics journals.

How Well Do Line Drawings Depict Shape? ( pdf | web | acm | 2009 )
Forrester Cole, Kevin Sanik, Doug DeCarlo, Adam Finkelstein, Thomas Funkhouser, Szymon Rusinkiewicz, Manish Singh

This paper is the reverse of last year's attempt - it asks a sample of people to guess the curvature of a set of objects from line drawings.

however there is little scientific evidence in the literature for these observations. Moreover, while a recent thread of the computer graphics literature devoted to the automatic algorithms for line drawings is flourishing, to date researchers have had no objective way to evaluate the effectiveness of such algorithms in depicting shape.

There are a bunch of different ways to generate contours for an image:
  • suggestive contours - marginal occluding contours (appear with small POV change)
  • ridges and valleys - local maxi-minima of surface curves
  • apparent ridges - view dependent ridges and valleys

Bonus points for mentioning Team Fortress 2 and using NPR (non photorealistic rendering).

Psychophysics defines a gauge figure - a 2d representation of a thumbtack with the spike giving the normal direction - there's a body of previous work of asking people to guess normals from outlines etc... These are the little brown dots in the above image.

The study then involved asking 560 people to position 275,000 gauges on different line drawings, to get an idea of the perceived surface direction.

An interesting idea here is the use of Amazon's Mechanical Turk to get lots of people to participate in the activity, in return for a micropayment. Basically there is very little control over the environment in which someone encounters the game -
  • there may be someone at the next desk playing the game, so you might be able to sneak a look at other representations of the same object
  • there might be all sort of distractions (cat walking on the keyboard)
  • visual display will not be calibrated in the same way (different gamma curves may lead to different perceptions of normals?)
However, these situations are quite rare and more than likely compensated for by the large volume of data produced. It's kind of exciting that this method of data collection is considered scientific enough for publication.

The study compensates for the "bas-relief" effect (assumed projection by the viewer) by scaling the results to a standard depth.

The results look solid - more ambiguous figures had a higher deviation from the median surface normal.

The occluding contour technique was worse than the others, with the other line drawing techniques being about equal. Comfortingly a 3D rendered/shaded image gave better results than all the line drawings.

A more detailed analysis of contentious areas of the models was performed and gave the interesting conclusion that sometimes and artist's representation was more reliable (the above computer generated image of a screwdriver has dark brown gauges showing ambiguity in the surface direction), but sometimes they over did it and gave the wrong impression. In these cases the computer generated outlines were better.

Wilcoxon/Mann-Whitney is a non-parametric test for comparing distributions that I hadn't stumbled across before.

Toward Evaluating Lighting Design Interface Paradigms for Novice Users ( pdf | acm | 2009 )
William B. Kerr, Fabio Pellacini

It turns out there are a few interfaces for defining lighting in a 3d scene:
  • direct (Maya/3DSMax style) - drag lights around
  • indirect - drag the shadows, edge of lights, and their results around
  • painting - a constraint solution that optimizes the light's position to make the scene look most like a sketch you just coloured in (above, right two images)
This paper evaulates these different techniques, as used by untrained people. They ask them to match the lighting in an example (same geometry, rendering etc...) and in a still frame from a movie onto simple geometry.

20 people are used, with 5 hours use per person (include warm up time...). The error differences between the rendered scenes are compared for the exact-match trial, while the movie-scene sections are evaluated in a more subjective manner. The results are also recorded over time, allowing the authors to graph how long each technique takes to converge.

They also do a questionnaire:
  1. Natural way to think about lighting
  2. ease of determining what to modify (selection)
  3. how it should be modified (intention)
  4. carryout the adjustment
  5. preference on few lights
  6. many lights
  7. preference in matching trials (goal image)
  8. open trials (no goal image)
  9. overall preference.
The general results was that the newer technique, painting doesn't do so well, and that direct and indirect illumination are better. Between direct and indirect, indirect gets to the result fastest.

The two statistical tools used are