Friday, March 19, 2010

exploring frequency responses for focus

In the name of badly rushed science for conference deadlines I present the accumulation of a couple of week's evenings messing around learning about frequency space (pdf).

[tl;dr] This paper got accepted, and I "presented" it at the Sicsa 2010 conference. Here's the video:





While I'd always seen the little bars on music amplifiers, I'd never thought of images being represented in the same way. The bars represent the frequencies being played back at any one time. The low frequencies (slower moving bars, normally to the right) are the deep sounds, and the high frequencies (fast bars on the left) are the high sounds. It turns out they have a nice analogue in the image plane, but because we don't look at at every pixel in a photo in order from start to end over 3 minutes we never see them.

If we identify the important areas of an image for each frequency (DoG pyramid/"monolith"), we can animate over the frequency (high frequencies first, then the low ones):



We can then see that a single point in the image has different intensities at different frequencies, as the shade of grey at a point changes. So there's one of the little bar-graphs for each pixel.

I built a little application that lets you see these graphs and the spatial frequencies in an image. It's quite fun to play with, you can start it by clicking here (java webstart, binary file, run at your own risk, source code below). Wait for it to load the preview images, select one, wait for it to build the map (lots of waiting here...) and then use the right-drag to move around, wheel to scroll, and the left button to show a frequency preview for a particular location. Move the mouse out of the frame to see the representative frequency map from the work.

As you drag the point around the lil bars change to show you what the frequency content is like in that area of the image.

tripping_crow

This was neat, and I had to do something with it, so I built a hoodicky that takes a single image and recovers the depth map from the focus of a single image. I assume that stuff in focus is near the camera, and stuff out of focus is a long way away - photography just like your Mum used to do. It turns out that not too much work has been done in this region, these guys even got a book article out about it last year.

So just to late to be hot news... but interesting none-the-less. So I decided to twist the concept a bit and use it for blur aware image resizing. The motivation being that you need big (expensive) lenses to take photos with a shallow Depth of Field, but when you resize these images, you loose that depth of field:

sicsa_hierarchy

In the smaller images, more of the logo appears to be in focus, but it's the same image, just scaled a bit.

So we want something that keeps that proportion of each frequency the same as you scale the image. So basically it's a thing that keeps foreground/background separation when scaling an image. We can use the focus of an image (closely connected to it's frequencies) to determine the depth, as this video shows.



In the following, a, b and d are normal scaling, while c & e use the depth map we've calculated using the frequency map.

woman

This uses the frequencies on the edges of the photo to classify the segmented image. This shows the same kind of thing at the top, and the segment-frequency map bottom left, and the recovered depth map bottom right.

crow_results

More results (SICSA are the people who pay half my rent...):

sicsa_results

Here a,b,c are standard scalings and d,e are from my algorithm.

lbird_results


Results are better than I expected for the few days I spent putting this thing together, but the basic problem is there are two or three parameters that need to be tuned for each image to take into account the noise of the image and the bias towards foreground/background classification. Good working prototype, lots of effort required to do this for real.

The write up looks like this (pdf, src below), I'm a bit certain some of the equations are wrong - but this is computer science, no one actually reads equations do they?



Source code is here. Java. I think it uses Apache licensed stuff, that that's how it should be treated, but I scrounged a fair bit of code, who knows where it's from... Main class should be FrequencyMap, and after setting up an image in that class you'll need to play around with the four constants at the top:
  • fMapThreshhold: Increase if not enough red (high freq) in the freq map, decrease if noise is being counted as red
  • scaleFac: usually okay to leave alone. If you want a frequency map with more levels, decrease this, keeping it above one.
  • filterRadius: noise reduction on edge classification. Increase to classify more edges as higher frequency
  • Q: increase to increase the number of segmentations.
It will write files out to the disk. Use at your own risk.


[edit: 22/3/10]

While out riding this weekend I figured out that it should be possible to analyse the defocused edges for evidence of higher frequencies to determine if the edge is in front of, or behind the in-focus point. More importantly if we can't see any high frequency edges in the bokeh, then it doesn't matter if that part of the edge is in front of, or behind the defocused edge...

bokeh_determination

[edit:] woo! It's been accepted (probably because it's one of the few graphics papers from Scotland). Reviewers comments follow:




Dear Author,

Thank you for your submission to the SICSA PhD Conference. We are delighted to inform you that, subject to corrections, your paper has been provisionally accepted to be presented at the conference. The comments from reviewers are included at the bottom of this email.

We will be in touch with you soon regarding the nature of the corrections that must be made in order for your paper to be included. You should note that the absolute deadline for these corrections will be MAY 19th 2010. No deadline extensions can be offered.

Regards


---------------------------------------------

Paper: 21
Title: Blur Preserving Image Resizing


---------------------------- REVIEW 1 --------------------------
PAPER: 21
TITLE: Blur Preserving Image Resizing

OVERALL RATING: -1 (weak reject)
REVIEWER'S CONFIDENCE: 1 (low)
Audience Appeal: 4 (A typical CS honours graduate will fully understand what the paper is about, but is unlikely to understand all the details.)
Significance: 4 (Interesting work which will probably be cited even if it does not shake the field of research.)
Originality: 4 (A minor or incremental improvement to existing work that is not immediately obvious.)
Presentation: 4 (The paper makes sense and the story is intelligible. However, the structure or grammar is sub-optimal, potentially making it frustrating to read.)
Validity: 5 (Valid, practical, relevant approaches are made and are probably correct but may not be concrete.)
Recommendation: 3 (The paper is below average, but I do not feel strongly as to whether it should be accepted or rejected.)


Please provide a response to each of the following queries:

1) What was the paper about?

The paper presents a method for scaling images which attempts to retain the image's apparent depth of field by selectively blurring parts of the image. The amount of Gaussian blur applied to each segment is proportional to the depth of the scene at that segment. This depth information is automatically calculated from a single image using an augmented segmentation algorithm.

2) What claims or hypotheses did the paper make (if any)?

The paper claims that:

- When images are scaled down the apparent narrow depth of field is lost.
- It is desirable to keep the same proportion of the image with the same CoC.
- The presented method is able to retain the apparent depth of field in an image as it is scaled down.
- The presented method is capable of narrowing the depth of field apparent in images captured using commodity compact cameras to allow a larger aperature to be simulated.

3) What are the good points of the paper? Why might it be accepted? What did the author do right?

The paper introduces a significant new method which can be useful in practice (as it requires very little information and/or extra equipment). The method has been clearly presented and is understandable. Plenty of images have been provided which do add to the text. The algorithm appears to perform the intended task well.

4) What are the bad points of the paper? Why might it be rejected? How might the author improve their work?

The first claim (that apparent depth of field is lost when images are scaled down) has not been justified. The efficacy of the method at retaining the depth of field has not been quantitatively analysed, nor has it been compared to the bog-standard scaling algorithms or the naive blurring approaches. The method has only been demonstrated to work on a number of hand-chosen examples. The authors make a third claim in passing which has not been demonstrated or backed up in the paper.

The paper should be shortened down to 10 pages. Changes could be made to sentence structure (particularly in the introduction and conclusion) to make the paper more understandable. The relationship between the presented method and previous related work could be explained in more detail.

Please also provide a justification for the scores you have provided in each of the following areas:

Audience Appeal (1 - bad, 6 - good):

The presented method is easy to follow. Only the presentation occasionally gets in the way.

Significance (1 - bad, 6 - good):

Originality (1 - bad, 6 - good):

It is of interest to see how the presented method relates to the existing literature on single-image depth estimation. Whilst the application area is novel, the underlying machinery may not be.

Presentation (1 - bad, 6 - good):

A number of grammatical and spelling errors exist. The paper is too long (14 pages). Some parts of the text feel unfinished (in particular the introduction and conclusion.

Validity (1 - bad, 6 - good):

The authors present a convincing technique, and also point out its potential shortcomings in the conclusion section.

Recommendation (1 - bad, 6 - good):

The paper, in its current form, is not ready for publication. First, the method needs to be analysed and compared quantitatively. Second, the writing, layout and presentation in general should be improved.

Any other comments to make to the authors:

(Changes have been capitalised)

- Grammar: '... large apertures, WHICH has led ...'
- Readability: '... That is the sharp area of focus ...'
- Readability: '... apparent narrow depth of field is lost, figure 1 ...'
- Readability: best to define the Circle of Confusion (CoC) *immediately* after it is introduced in Section 3.
- Readability: the CoC could be explained more clearly in Section 3.1
- Readability: caption for Figure 2
- Style: figure 1, figure 2, etc should be Figure 1, Figure 2, etc
- Style: best not to begin figure captions with labels (a, b, etc)
- Style: there should be a space between citation numbers and preceding text:
'... an all-focused image[2] ...' should be '... an all-focused image [2] ...'
- Typos: extra bracket at the end of the 2nd equation on page 4
- Typos: '... whilst we examine the a larger frequency space.'
- Typos: '... the frequency content of ITS border ...'
- Citations: the first citation (Wikipedia) has not been made correctly



---------------------------- REVIEW 2 --------------------------
PAPER: 21
TITLE: Blur Preserving Image Resizing

OVERALL RATING: 2 (accept)
REVIEWER'S CONFIDENCE: 1 (low)
Audience Appeal: 5 (Easy to follow and read with minor points remaining unclarified.)
Significance: 4 (Interesting work which will probably be cited even if it does not shake the field of research.)
Originality: 4 (A minor or incremental improvement to existing work that is not immediately obvious.)
Presentation: 5 (The paper is well structured and can be fully understood. However, some sentences are badly or awkwardly written.)
Validity: 4 (Valid, relevant approaches are made, but they are not convincing or may be a little impractical.)
Recommendation: 5 (The paper is good and should be included.)


Please provide a response to each of the following queries:

1) What was the paper about?
This paper describes an image resizing algorithm that improves quality by
maintaining depth perspective.


2) What claims or hypotheses did the paper make (if any)?
This paper identifies that depth perspective is lost when images
are reduced in size resulting in a loss of quality. With a growing
number of collaborative image viewing sites where images are initially
viewed as a thumbnail the loss of depth perspective results in a loss
of quality in the resized images.

An algorithm is proposed for resizing images which maintains depth
perspective. The image is segmented, each segment is classified to a
depth using the frequency and then blurred to match the new size.


3) What are the good points of the paper? Why might it be accepted? What did the author do right?
I enjoyed reading this paper. It is a clear, well written paper with
a good structure that is easy to follow.

The problem being addressed is clearly identified and well described
both in the text and by example. Most readers will have practical
experience of this issue making the paper particularly relevant to
a general audience.

The paper makes good use of numerous examples to explain each stage
of the approach. This made it easier to understand the impact of each
stage of the resizing process even if some of the underlying technical
aspects were, by their nature, more difficult to follow.


4) What are the bad points of the paper? Why might it be rejected? How might the author improve their work?

My biggest issue with this paper is the evaluation. A new algorithm
is proposed with evidence of its effectiveness being left to the
reader by the inclusion of 4 sample images that have been resized.
I would expect to see stronger evidence to support the proposed
approach. While a quantitative evaluation may be difficult a user
evaluation should be possible and could provide evidence both that
the problem is important and that the proposed approach is a
perceived improvement.

The assumption that in focus object is in the foreground appears to
be a very limiting restriction of the approach. Although raised as
an issue in the conclusions, this issue was not discussed in sufficient
detail and merits more discussion on the extent to which this would
limit the use of the algorithm.

The abstract is OK as far as it goes but is too short. It should be
expanded to give, at least, an indication of the methodology adopted.



Please also provide a justification for the scores you have provided in each of the following areas:

Audience Appeal (1 - bad, 6 - good):5
Maintaining image quality on resized images is a subject area that
most of the audience will be able to relate to. This paper is likely
to be understandable and of interest to a wide spectrum of a general
audience. However, some aspects of the paper are understandably quite
technical.

Significance (1 - bad, 6 - good):4
The problem being investigated is clearly identified and appears to
be a genuine issue at least conceptually. However, given the lack of
a user evaluation it is difficult to judge the importance of the
problem and the extent to which this work solves it and improves
the quality of image resizing.


Originality (1 - bad, 6 - good):4
The work appears novel. Perhaps not in the broad approach used but at
least in the combination of techniques used and for the specific
application addressed in this paper.

Presentation (1 - bad, 6 - good):5
This paper is well written and has a good structure. Typos and minor issues
are listed below - although very few.

Validity (1 - bad, 6 - good):4
The methodology adopted appears fine and its success is demonstrated by
the inclusion of a number of sample images shown at different sizes.
However it is difficult to quantify the success of the approach on a
small number of images. A larger evaluation would provide more evidence
to support to the approach.

Recommendation (1 - bad, 6 - good):5
Overall a solid piece of work that merits a place in the conference.

Any other comments to make to the authors:

Typos and minor issues:

page2para4line4: "Alternatively[11]" --> "Alternatively [11]" numerous other examples
page2para8line7: "examine the a larger" --> "examine a larger" ?
page5para1line3: "from [9]" --> " from [9]"
page9para4line2: "allow the" --> "allows the"



---------------------------- REVIEW 3 --------------------------
PAPER: 21
TITLE: Blur Preserving Image Resizing

OVERALL RATING: 2 (accept)
REVIEWER'S CONFIDENCE: 2 (medium)
Audience Appeal: 4 (A typical CS honours graduate will fully understand what the paper is about, but is unlikely to understand all the details.)
Significance: 4 (Interesting work which will probably be cited even if it does not shake the field of research.)
Originality: 4 (A minor or incremental improvement to existing work that is not immediately obvious.)
Presentation: 4 (The paper makes sense and the story is intelligible. However, the structure or grammar is sub-optimal, potentially making it frustrating to read.)
Validity: 5 (Valid, practical, relevant approaches are made and are probably correct but may not be concrete.)
Recommendation: 5 (The paper is good and should be included.)


Please provide a response to each of the following queries:

1) What was the paper about?

This paper proposed a new method to extract depth information from a sigle image, which was used to resize the image while the blurriness caused by our-of-focus was maintained.


2) What claims or hypotheses did the paper make (if any)?

The authors claims a technique to calculate depth values for an image.

3) What are the good points of the paper? Why might it be accepted? What did the author do right?

The results of the work are well presented and the related work are clearly stated.

4) What are the bad points of the paper? Why might it be rejected? How might the author improve their work?

Please also provide a justification for the scores you have provided in each of the following areas:

Audience Appeal (1 - bad, 6 - good):

The paper is clearly structured and easy to follow. It may be of interest to researchers in the fields of Computer Vision and Image Processing.

Significance (1 - bad, 6 - good):

The contribution of this work may be considered incremental in nature.

Originality (1 - bad, 6 - good):

The work has some novel aspects including the segmentation of the iso-depth region.

Presentation (1 - bad, 6 - good):

The paper is well presentated and each section is clearly linked.

Validity (1 - bad, 6 - good):

The results showed in the paper are strong evidence to the validity.

Recommendation (1 - bad, 6 - good):

The paper should be accepted.

Any other comments to make to the authors:

No comments:

Post a Comment