Wednesday, March 31, 2010

In 2006 the Damien Hurst was accused (via) of copying a design ("valium", top) from Robert Dixon ("true daisy", bottom). Hurst is a quite well known millionaire (valium's 500 prints sold for "four and five figure sums") and Dixon a relatively unknown computer/geometric artist. At first glance it looks like Hurst just coloured in, flipped and cropped Dixon's work, but looking a bit deeper it gets more interesting.

Dixon's work is based on the patterns of spiral Phyllotaxis, a natural phenomena that describes (among other things) the arrangement of needles on a pinecone and pattern of seeds on a sunflower head.

by RaeA

This is a old mathematical pattern that can be expressed as an formula, and was well known enough to be commented on by da Vinci. Because of this Dixon was able to write code (similar to mine, that produced my diagrams below) to generate his pattern.

Simply, Phyllotaxis is a spiral pattern of points. Although many different spacings are possible, the fantastic thing is that nature always chooses the same spacing. My diagram below shows a few possible spacings, and one jumps out as being "correct" as it covers the space most evenly. This is the pattern that is used by nature, Dixon and Hurst.

But the question is - should he be able to claim copyright on it? and is Hurst's work derivative of it? Copyright law say yes - Dixon has some claim over this figure, as he created it (no matter it's source) and has a copyright claim. The law just asks that he has the work is "original, and exhibits a degree of labour, skill or judgement". Maybe he was the first to use circles of that diameter to represent the pattern. So if I can build a general machine that with some parameter imitates Dixon's machine - does Dixon own the output of that machine?

Okay, so I built the above machine after viewing Dixon's work, at some point it probably outputs something that looks enough like Dixon's work. This is a real headache for me as a programmer - is the output of my program derivative? does Dixon own the copyright to the program he's never seen?

One answer is that I should block particular frames that contain copyrighted images. To do this in this single instance, I would have to describe Dixon's work in the program to stop it infringing. The program would then contain copyrighted material and be illegal. To do it in the general instance, for all copyrighted spiral patterns, I would need to do an international prior-art search, which would cost vastly more than the two hours it took to create the above video.

There are excuses for copyright, for example if you can prove you didn't know about the existing image, that you didn't "copy" it, then the use is mostly covered under fair deal (or fair use in the USofA). But some inventions are covered by the more powerful patents - in this case even if you come up with the idea independently yourself, the first person to be awarded the pattern hold all rights to it.

In situations like this, Intellectual Property (IP) law feels very wrong, this is an algorithmic pattern, and if I'm allowed to copyright it, or copyright a video of all possible versions of it, there is nothing left for the rest of the world to use.

If I can write a program that recreates your patented "invention" or copyrighted "original" object do you own that program? If I can write a program that will write that program, given a certain command (abracadabra!), do you own that word? do you own/control the first program and the second program? are both programs illegal (under DMCA-like legislation they probably are). Does this mean that any program that can recreate patented work is illegal? does this include a computer compiler (probably not)?
At the moment these ideas are unlikely, and limited to toy examples. But in ten years, when anyone can ask the computer to create a program that lets you edit photos ("now I want a circular brush!"), and that program infringes on hundreds of patents, it seems like IP law will have to change.

Another reason to burst the IP bubble sooner rather than later? A previous post on copyright can be found here.

Monday, March 22, 2010

rapid prototyping procedural buildings

via boing boing.

These guys have a set of templates you can send off to a rapid prototyper (or yourself) to build a kit-set for your own cathedral. You choose which sections you want printed

Then glue them together yourself -

This is what procedural architecture is all about - what they have here is a simple shape grammar! We're not far from the situation of having designs where you can define an arbitrary floor plan and style on a website, and download a solid file you can get printed. Or how about having it printed big enough to be a dog house? or somewhere to live in the summer? I can't wait for the first lorry to arrive with a procedurally generated building on the back :)

Friday, March 19, 2010

exploring frequency responses for focus

In the name of badly rushed science for conference deadlines I present the accumulation of a couple of week's evenings messing around learning about frequency space (pdf).

[tl;dr] This paper got accepted, and I "presented" it at the Sicsa 2010 conference. Here's the video:

While I'd always seen the little bars on music amplifiers, I'd never thought of images being represented in the same way. The bars represent the frequencies being played back at any one time. The low frequencies (slower moving bars, normally to the right) are the deep sounds, and the high frequencies (fast bars on the left) are the high sounds. It turns out they have a nice analogue in the image plane, but because we don't look at at every pixel in a photo in order from start to end over 3 minutes we never see them.

If we identify the important areas of an image for each frequency (DoG pyramid/"monolith"), we can animate over the frequency (high frequencies first, then the low ones):

We can then see that a single point in the image has different intensities at different frequencies, as the shade of grey at a point changes. So there's one of the little bar-graphs for each pixel.

I built a little application that lets you see these graphs and the spatial frequencies in an image. It's quite fun to play with, you can start it by clicking here (java webstart, binary file, run at your own risk, source code below). Wait for it to load the preview images, select one, wait for it to build the map (lots of waiting here...) and then use the right-drag to move around, wheel to scroll, and the left button to show a frequency preview for a particular location. Move the mouse out of the frame to see the representative frequency map from the work.

As you drag the point around the lil bars change to show you what the frequency content is like in that area of the image.

This was neat, and I had to do something with it, so I built a hoodicky that takes a single image and recovers the depth map from the focus of a single image. I assume that stuff in focus is near the camera, and stuff out of focus is a long way away - photography just like your Mum used to do. It turns out that not too much work has been done in this region, these guys even got a book article out about it last year.

So just to late to be hot news... but interesting none-the-less. So I decided to twist the concept a bit and use it for blur aware image resizing. The motivation being that you need big (expensive) lenses to take photos with a shallow Depth of Field, but when you resize these images, you loose that depth of field:

In the smaller images, more of the logo appears to be in focus, but it's the same image, just scaled a bit.

So we want something that keeps that proportion of each frequency the same as you scale the image. So basically it's a thing that keeps foreground/background separation when scaling an image. We can use the focus of an image (closely connected to it's frequencies) to determine the depth, as this video shows.

In the following, a, b and d are normal scaling, while c & e use the depth map we've calculated using the frequency map.

This uses the frequencies on the edges of the photo to classify the segmented image. This shows the same kind of thing at the top, and the segment-frequency map bottom left, and the recovered depth map bottom right.

More results (SICSA are the people who pay half my rent...):

Here a,b,c are standard scalings and d,e are from my algorithm.

Results are better than I expected for the few days I spent putting this thing together, but the basic problem is there are two or three parameters that need to be tuned for each image to take into account the noise of the image and the bias towards foreground/background classification. Good working prototype, lots of effort required to do this for real.

The write up looks like this (pdf, src below), I'm a bit certain some of the equations are wrong - but this is computer science, no one actually reads equations do they?

Source code is here. Java. I think it uses Apache licensed stuff, that that's how it should be treated, but I scrounged a fair bit of code, who knows where it's from... Main class should be FrequencyMap, and after setting up an image in that class you'll need to play around with the four constants at the top:
• fMapThreshhold: Increase if not enough red (high freq) in the freq map, decrease if noise is being counted as red
• scaleFac: usually okay to leave alone. If you want a frequency map with more levels, decrease this, keeping it above one.
• filterRadius: noise reduction on edge classification. Increase to classify more edges as higher frequency
• Q: increase to increase the number of segmentations.
It will write files out to the disk. Use at your own risk.

[edit: 22/3/10]

While out riding this weekend I figured out that it should be possible to analyse the defocused edges for evidence of higher frequencies to determine if the edge is in front of, or behind the in-focus point. More importantly if we can't see any high frequency edges in the bokeh, then it doesn't matter if that part of the edge is in front of, or behind the defocused edge...

[edit:] woo! It's been accepted (probably because it's one of the few graphics papers from Scotland). Reviewers comments follow:

Dear Author,

Thank you for your submission to the SICSA PhD Conference. We are delighted to inform you that, subject to corrections, your paper has been provisionally accepted to be presented at the conference. The comments from reviewers are included at the bottom of this email.

We will be in touch with you soon regarding the nature of the corrections that must be made in order for your paper to be included. You should note that the absolute deadline for these corrections will be MAY 19th 2010. No deadline extensions can be offered.

Regards

---------------------------------------------

Paper: 21
Title: Blur Preserving Image Resizing

---------------------------- REVIEW 1 --------------------------
PAPER: 21
TITLE: Blur Preserving Image Resizing

OVERALL RATING: -1 (weak reject)
REVIEWER'S CONFIDENCE: 1 (low)
Audience Appeal: 4 (A typical CS honours graduate will fully understand what the paper is about, but is unlikely to understand all the details.)
Significance: 4 (Interesting work which will probably be cited even if it does not shake the field of research.)
Originality: 4 (A minor or incremental improvement to existing work that is not immediately obvious.)
Presentation: 4 (The paper makes sense and the story is intelligible. However, the structure or grammar is sub-optimal, potentially making it frustrating to read.)
Validity: 5 (Valid, practical, relevant approaches are made and are probably correct but may not be concrete.)
Recommendation: 3 (The paper is below average, but I do not feel strongly as to whether it should be accepted or rejected.)

Please provide a response to each of the following queries:

1) What was the paper about?

The paper presents a method for scaling images which attempts to retain the image's apparent depth of field by selectively blurring parts of the image. The amount of Gaussian blur applied to each segment is proportional to the depth of the scene at that segment. This depth information is automatically calculated from a single image using an augmented segmentation algorithm.

2) What claims or hypotheses did the paper make (if any)?

The paper claims that:

- When images are scaled down the apparent narrow depth of field is lost.
- It is desirable to keep the same proportion of the image with the same CoC.
- The presented method is able to retain the apparent depth of field in an image as it is scaled down.
- The presented method is capable of narrowing the depth of field apparent in images captured using commodity compact cameras to allow a larger aperature to be simulated.

3) What are the good points of the paper? Why might it be accepted? What did the author do right?

The paper introduces a significant new method which can be useful in practice (as it requires very little information and/or extra equipment). The method has been clearly presented and is understandable. Plenty of images have been provided which do add to the text. The algorithm appears to perform the intended task well.

4) What are the bad points of the paper? Why might it be rejected? How might the author improve their work?

The first claim (that apparent depth of field is lost when images are scaled down) has not been justified. The efficacy of the method at retaining the depth of field has not been quantitatively analysed, nor has it been compared to the bog-standard scaling algorithms or the naive blurring approaches. The method has only been demonstrated to work on a number of hand-chosen examples. The authors make a third claim in passing which has not been demonstrated or backed up in the paper.

The paper should be shortened down to 10 pages. Changes could be made to sentence structure (particularly in the introduction and conclusion) to make the paper more understandable. The relationship between the presented method and previous related work could be explained in more detail.

Please also provide a justification for the scores you have provided in each of the following areas:

Audience Appeal (1 - bad, 6 - good):

The presented method is easy to follow. Only the presentation occasionally gets in the way.

Significance (1 - bad, 6 - good):

Originality (1 - bad, 6 - good):

It is of interest to see how the presented method relates to the existing literature on single-image depth estimation. Whilst the application area is novel, the underlying machinery may not be.

Presentation (1 - bad, 6 - good):

A number of grammatical and spelling errors exist. The paper is too long (14 pages). Some parts of the text feel unfinished (in particular the introduction and conclusion.

Validity (1 - bad, 6 - good):

The authors present a convincing technique, and also point out its potential shortcomings in the conclusion section.

Recommendation (1 - bad, 6 - good):

The paper, in its current form, is not ready for publication. First, the method needs to be analysed and compared quantitatively. Second, the writing, layout and presentation in general should be improved.

Any other comments to make to the authors:

(Changes have been capitalised)

- Grammar: '... large apertures, WHICH has led ...'
- Readability: '... That is the sharp area of focus ...'
- Readability: '... apparent narrow depth of field is lost, figure 1 ...'
- Readability: best to define the Circle of Confusion (CoC) *immediately* after it is introduced in Section 3.
- Readability: the CoC could be explained more clearly in Section 3.1
- Readability: caption for Figure 2
- Style: figure 1, figure 2, etc should be Figure 1, Figure 2, etc
- Style: best not to begin figure captions with labels (a, b, etc)
- Style: there should be a space between citation numbers and preceding text:
'... an all-focused image[2] ...' should be '... an all-focused image [2] ...'
- Typos: extra bracket at the end of the 2nd equation on page 4
- Typos: '... whilst we examine the a larger frequency space.'
- Typos: '... the frequency content of ITS border ...'
- Citations: the first citation (Wikipedia) has not been made correctly

---------------------------- REVIEW 2 --------------------------
PAPER: 21
TITLE: Blur Preserving Image Resizing

OVERALL RATING: 2 (accept)
REVIEWER'S CONFIDENCE: 1 (low)
Audience Appeal: 5 (Easy to follow and read with minor points remaining unclarified.)
Significance: 4 (Interesting work which will probably be cited even if it does not shake the field of research.)
Originality: 4 (A minor or incremental improvement to existing work that is not immediately obvious.)
Presentation: 5 (The paper is well structured and can be fully understood. However, some sentences are badly or awkwardly written.)
Validity: 4 (Valid, relevant approaches are made, but they are not convincing or may be a little impractical.)
Recommendation: 5 (The paper is good and should be included.)

Please provide a response to each of the following queries:

1) What was the paper about?
This paper describes an image resizing algorithm that improves quality by
maintaining depth perspective.

2) What claims or hypotheses did the paper make (if any)?
This paper identifies that depth perspective is lost when images
are reduced in size resulting in a loss of quality. With a growing
number of collaborative image viewing sites where images are initially
viewed as a thumbnail the loss of depth perspective results in a loss
of quality in the resized images.

An algorithm is proposed for resizing images which maintains depth
perspective. The image is segmented, each segment is classified to a
depth using the frequency and then blurred to match the new size.

3) What are the good points of the paper? Why might it be accepted? What did the author do right?
I enjoyed reading this paper. It is a clear, well written paper with
a good structure that is easy to follow.

The problem being addressed is clearly identified and well described
both in the text and by example. Most readers will have practical
experience of this issue making the paper particularly relevant to
a general audience.

The paper makes good use of numerous examples to explain each stage
of the approach. This made it easier to understand the impact of each
stage of the resizing process even if some of the underlying technical
aspects were, by their nature, more difficult to follow.

4) What are the bad points of the paper? Why might it be rejected? How might the author improve their work?

My biggest issue with this paper is the evaluation. A new algorithm
is proposed with evidence of its effectiveness being left to the
reader by the inclusion of 4 sample images that have been resized.
I would expect to see stronger evidence to support the proposed
approach. While a quantitative evaluation may be difficult a user
evaluation should be possible and could provide evidence both that
the problem is important and that the proposed approach is a
perceived improvement.

The assumption that in focus object is in the foreground appears to
be a very limiting restriction of the approach. Although raised as
an issue in the conclusions, this issue was not discussed in sufficient
detail and merits more discussion on the extent to which this would
limit the use of the algorithm.

The abstract is OK as far as it goes but is too short. It should be
expanded to give, at least, an indication of the methodology adopted.

Please also provide a justification for the scores you have provided in each of the following areas:

Audience Appeal (1 - bad, 6 - good):5
Maintaining image quality on resized images is a subject area that
most of the audience will be able to relate to. This paper is likely
to be understandable and of interest to a wide spectrum of a general
audience. However, some aspects of the paper are understandably quite
technical.

Significance (1 - bad, 6 - good):4
The problem being investigated is clearly identified and appears to
be a genuine issue at least conceptually. However, given the lack of
a user evaluation it is difficult to judge the importance of the
problem and the extent to which this work solves it and improves
the quality of image resizing.

Originality (1 - bad, 6 - good):4
The work appears novel. Perhaps not in the broad approach used but at
least in the combination of techniques used and for the specific

Presentation (1 - bad, 6 - good):5
This paper is well written and has a good structure. Typos and minor issues
are listed below - although very few.

Validity (1 - bad, 6 - good):4
The methodology adopted appears fine and its success is demonstrated by
the inclusion of a number of sample images shown at different sizes.
However it is difficult to quantify the success of the approach on a
small number of images. A larger evaluation would provide more evidence
to support to the approach.

Recommendation (1 - bad, 6 - good):5
Overall a solid piece of work that merits a place in the conference.

Any other comments to make to the authors:

Typos and minor issues:

page2para4line4: "Alternatively[11]" --> "Alternatively [11]" numerous other examples
page2para8line7: "examine the a larger" --> "examine a larger" ?
page5para1line3: "from [9]" --> " from [9]"
page9para4line2: "allow the" --> "allows the"

---------------------------- REVIEW 3 --------------------------
PAPER: 21
TITLE: Blur Preserving Image Resizing

OVERALL RATING: 2 (accept)
REVIEWER'S CONFIDENCE: 2 (medium)
Audience Appeal: 4 (A typical CS honours graduate will fully understand what the paper is about, but is unlikely to understand all the details.)
Significance: 4 (Interesting work which will probably be cited even if it does not shake the field of research.)
Originality: 4 (A minor or incremental improvement to existing work that is not immediately obvious.)
Presentation: 4 (The paper makes sense and the story is intelligible. However, the structure or grammar is sub-optimal, potentially making it frustrating to read.)
Validity: 5 (Valid, practical, relevant approaches are made and are probably correct but may not be concrete.)
Recommendation: 5 (The paper is good and should be included.)

Please provide a response to each of the following queries:

1) What was the paper about?

This paper proposed a new method to extract depth information from a sigle image, which was used to resize the image while the blurriness caused by our-of-focus was maintained.

2) What claims or hypotheses did the paper make (if any)?

The authors claims a technique to calculate depth values for an image.

3) What are the good points of the paper? Why might it be accepted? What did the author do right?

The results of the work are well presented and the related work are clearly stated.

4) What are the bad points of the paper? Why might it be rejected? How might the author improve their work?

Please also provide a justification for the scores you have provided in each of the following areas:

Audience Appeal (1 - bad, 6 - good):

The paper is clearly structured and easy to follow. It may be of interest to researchers in the fields of Computer Vision and Image Processing.

Significance (1 - bad, 6 - good):

The contribution of this work may be considered incremental in nature.

Originality (1 - bad, 6 - good):

The work has some novel aspects including the segmentation of the iso-depth region.

Presentation (1 - bad, 6 - good):

The paper is well presentated and each section is clearly linked.

Validity (1 - bad, 6 - good):

The results showed in the paper are strong evidence to the validity.

Recommendation (1 - bad, 6 - good):

The paper should be accepted.

Any other comments to make to the authors:

Friday, March 05, 2010

procedural Inc are looking for a computer vision expert!

Procedural create CityEngine, one of the few useful procedural urban tools around. Does this mean that they are looking for an easier way to create models of buildings? Reconstruction from street photographs? From aerial photographs? Why not from existing meshes (from video game artists)?

Image-based Street-side City Modeling (pdf) from Sigg09 shows some of the potential of this direction.

I'll be looking forward to seeing what these guys do next.

Wednesday, March 03, 2010

first year report

Sorry, it's been a little quiet around here recently. One of many reasons is that I put together my first year report (updated pdf). Notes:.
•  Having taken criticism from blog readers and supervisors alike, I've updated this. Some of the cut bits appear below - if you've already read it, don't bother reading it again, nothing significant has changed.
• Some of the more interesting results from my time at ASU have been cut for client confidentiality. This is getting quite annoying now, will try not to fall into non-disclosure hole again.
• It's mainly a bunch of blog posts end on end.
• It isn't really of the quality required for a thesis.

Bits that didn't make the cut:

Possible Theses

Here I present a few possible directions that I see my work progressing in. Because of the new ground that is being covered, posing these direction as theses is a little artificial at this point.

Procedural Workflow

Useful procedural systems can be constructed by non-technical users
Three dimensional modelling packages such as Wavefront's 3D Studio Max or Maya present artists with very complicated interfaces for the creation of 3D content. These systems include data flows, such as 3DSM's hypergraph which allows different properties of a material to be edited, and construction histories. It should be possible to build a system of similar complexity, that can be used by 3D artists to enable them to build procedural objects. This would involve investigating subjects such as intelligent primitives (such as blobby objects or z-spheres), the generative modelling language (GML), and current 3D workflows.

Reverse proceduralization

Reverse engineering real world or artist created objects allows for effective proceduralization of geometry
From a design perspective reverse proceduralization, or reverse engineering allows artists to use their current tools to create a single instance, and then perform proceduralization as an after-thought. This would give the easiest integration into the current workflows of all proposed procedural solutions. Study in this subject would involve machine learning techniques to extract form from designs, after sufficient pattern-analysis has been performed.

Floating Point Robustness

It is possible to perform robust geometrical computations in a floating point environment.
One of the upshots of my work with the straight skeleton and the Voronoi tessellation is the realisation of how hard it is to work with geometry in a floating point environment. I found that the vast majority of my time was concerned with robustness issues. For example a point that is on one side of a line can appear on the other if the calculation is repeated with an indentical line travelling in the opposite direction.

There are well known limits, within which standard geometric queries fail to behave in a predictable fashion. The result is quite chaotic, and makes designing robust algorithms both time consuming and bug-prone.

There are several attempts to make computational geometry more robust, but I've found none to make it easier. The ideal goal here would be to create an library that an undergraduate could use for robust geometric manipulation, in place of CGAL or similar.