Tuesday, March 31, 2009

geometric robustness papers

I've suddenly lost faith in floating point (FP) arithmetic for geometry. This is best explained by the CGAL people. Here are my notes on papers I've read:

Classroom Examples of Robustness Problems in Geometric Computations ( web | Lutz Kettner, Kurt Mehlhorn, Sylvain Pion, Stefan Schirra & Chee Yap | 2006 )

This is a great introduction to the properties of floating point arithmetic in a floating point environment. The authors set out to describe a complete set situations where floating point upsets a few simple algorithms.

The authors describe this in respect to calculating the convex hull (gift wrapping algorithms) and finding the 3d Delunay triangulation of a set of points. They give the facts that the algorithms rely on, and show how these often do not exist in the floating point situations.

The most common cause of error is that when working with number of different sizes, the smaller one is always rounded to fit into the larger one. When working with points, this means that the precision of a calculation is limited by the largest point involved. In many situations these number can be spatially remote. The below image shows the results of trying to decide which side (of the black) line a point lies on. The point is moved over the (256x256) grid in the smallest steps allowed by double precision. The grid is the colour coded based on which side of the line the algorithm reports the point to be on (red, yellow and blue for right, on-the-line and left respectively).

Different lines are examined in each of the three images - the patters of the errors are quite non-intuitive. We often see points classified on the wrong side of the line.

The patterns get messier if we look at higher precision floating point types. The paper does a very good job at explaining how these properties of floating point can lead to gross errors in the algorithms.

It briefly looks at techniques that don't solve the problem - adding an epsilon value (video below) and perturbing the input.

Here is my video of animating the epsilon value in a geometric floating point context:

We're evaluating the same function to determine which side of a line a point is on (except the horizontal is flipped). Over each frame we move the point around. Blue is one side, red the other, yellow on the line. We move the point by the smallest quantity allowed by the double precision floating point system.

The animation shows an increasing "fudge factor" or epsilon added to try and make the computation more robust. The boundary stays just as strange!

Animating the epsilon value of the first example in the paper from 0 in steps of 2^-44 @ 3 frames per second. Java code.

Construction of the Voronoi diagram for `one million' generators in single-precision arithmetic ( pdf | ieee | doi | Sugihara, Iri | 1992 )

Hence, we change our goal; we try to construct a diagram which shares some topological properties with the true Voronoi digram....the output "converges" to the true Vornoi digram as the precision in computation becomes higher.

To do this the paper's authors present version of the insertion Voronoi algorithm that modifies the topology from one consistent state to the next. The basic step is to query the topological model before each cell is included in the insertion of a new point.

Given a new point to insert (i, below), the topological model has three rules relate to finding the set of cell-corners to modify:
  • The set of cell corners is not empty
  • The set of cell corners form a tree
  • Every corner is reachable from every other corner by traversing existing Voronoi edges that lie between cells in the set.
These properties have are not shown to be necessary and sufficient to derive the Voronoi (I think they are, elsewhere, see the references in next paper - Topologically Orientated Implementation), but they get close - selecting topologically consistent cells. The algorithm "flood-fills" a region of the cell-corners, starting with the nearest to the new point (blue, i, below), collecting cell-corners (ii, green dots) that satisfy these conditions. Once all adjacent corners have been shown to be inadmissible (red dots), all the admissible ones can be proved to have been found. Then (iii), the Voronoi graph and the topological map is updated, and we move onto the next new point:


There is also a section on how to compute with the least floating point inaccuracy, but it isn't as interesting. The cunning work here was breaking the algorithm down into a set of topological manipulations that are always valid.

Sometimes the floating point output isn't valid, that is cells and lines overlap. But that happens in areas proportional in size to the error margin (FP accuracy) - they could easily be post-filtered out if required. The robustness of the algorithm can produce output such as the million points in the above (black and white) figure, that would otherwise cause visible trouble.

The approach can be seen as using the topology as the central structure that is guided by the inaccurate floating point calculations.

Topology-Oriented Implementation—An Approach to Robust Geometric Algorithms ( doi | pdf | K. Sugihara, M. Iri, H. Inagaki & T. Imai | 2000 )

This is another presentation of the previous material with a couple of different examples. Logical and combinatorial computations can be done correctly while geometric ones are unreliable.

The overview is that geometric algorithms have a set of topological solutions, that branch from each other. One of these is the correct solution, but as long as we end up somewhere on tree of valid solutions we're doing ok. The floating point gods (inaccuracies) decide which branch the algorithm will follow. More formally (quoted:)
  • Collect purely topolgical properties that should be satisfied by the solutions of the problem and can be checked efficiently
  • Describe the basic part of the algorithm only in terms of combinatorial and topological computation in such a way that the properties are guaranteed
  • Conduct numerical compuations at each note of the tree in order to choose the branch that is most licely to lead to the correction solution
Since we can't even detect degerate geometric cases consistently, the output might be degenerate, but if well designed, the degeneracies will be at around the floating-point-accuracy-scale. As the precision of the FP calculations increases, the solution converges on the desired one.

The paper includes examples for surface-slicing and Voronoi tessellation.

Lazy Arithmetic ( doi | pdf | ieee | Dominique Michelucci & Jean-Michel Moreau | 1997 )

This is a subject overview paper.

It outlines several solid geometric examples that are broken by floating point arithmetic -
  • The point where two lines cross is on a different side of each line
  • Zero determinants of non-singular matrices (the result changes depending on how the calculation is done)
  • Lexicographical order of line crossing (y1 < y1 ="=">
  • Simple comutations expressed in different ways suffer different truncation limits so one equation has two answers
  • Pappus' theorem doesn't generally hold
  • Borderline cases when using matrix determinants to identify angles less than or greater than a right angle
Two main classes of ways of dealing with these problems are presented - inexact and exact algorithms.

Inexact algorithms do not produce the exact mathematical result, but approximate this result. Since FP is only an approximation in the first place, these can be viewed as a set of tools for ensuring that the algorithms do not end up in an inconsistent state (they don't infinetely loop or teminate abnormally).
  • Warned programming involves coding tricks using limited reasoning (topological data or exact number representation) as each borderline case is found.
  • Epsilon Heuristics - assume that two items are identical if they are within a given range (epsilon). There is no way to guarantee that this approach will produce a consistent internal state - we are effectively shuffling points in one region of space.
  • Symbolic Programming - By ensuring that the algorithm's choices are guided by topological or combinatorial reasoning, floating point errors will only change the path not create invalid results. This is the idea presented in the Topology-Orientated Implementation paper by Sugihara.
  • Historical Book-keeping - By keeping a list of decisions and ensuring that all following decisions are consistent with previous decisions, the world is consistent from the algorithms perspective. This approach is described as being cumbersome and using lots of memory.
  • Confidence Intervals - Formalised Epsilon Heuristics - results involving floating point calculations can be "I don't know" when FP is known to be incorrect.
Entirely exact methods on the other hand always calculate the true value of every expression. Since we're often interested in boolean answers (is this point on the left or right of that line), we do a lot of additional work.

Ten random points with double-precision coordinates were triangulated using floating-point arithmetic in .1 seconds; 10 random points with rational coordinates (two-digit numerator, three-digit denominator, in base 2) were triangulated using rational arithmetic in 1,200 seconds, generating intermediate values with as many as 81 digits. [Efficient Delaunay Triangulation Using Rational Arithmetic: Karasick et.al]

The additional processing makes exact methods formidable, therefore mixed methods are introduced. These first perform the calculations in a specific accuracy, but track the error bounds to ensure the result is correct. Classes of language such as XSC do this automatically. If the result is unstable around the bounds then the algorithm reverts to exact arithmetic.

One approach (Karasick/Yamagushi) converts all calculations to computing the sign of a 4x4 determinant. This is then calculated using the most significant bits, then adds additional bits until the result is unambiguous. While I haven't seen enough geometry to be able to say if all problems can be found my calculating the det of a matrix - I am sure many of the important ones (which side of a line does a point lie on?) can be. Fortune and van Wyk implement an automatic system along these lines that compiles down to c++ source.

Several of the exact methods pre-process a specific problem to calculate how much precision is required, then calculate using these figures.

At this point the authors of the paper point out that using a specific tolerance ("long integer libraries" eg BigNum) means that division often has to be barred for fear of recurring decimals. All of the presented exact methods need lots of additional work by the programmer.

The paper continues to introduce Lazy Evaluation - floating point calculations are performed until the error is enough to introduce uncertainty. At this point it reverts to using exact? evaluation.

The operations performed on the values ("Lazy Numbers") are tracked by a parse-tree (or symbolic computing) like graph specifying operators and operands. When an operation is ambiguous (within the error bounds) the graph is evaluated using exact arithmetic.

This seems like a dream answer for Declarative programming enthusiasts (Haskell!).

The cost of Lazy Arithmetic is given as a 4-10 decrease in speed, which considering it's a big leap away from the computer's architecture, is very pleasing. Even better it's claimed to be 150 times faster than purely exact methods. In systems where this method is used (CGAL) the cost appears to be much larger for more complicated algorithms.

The paper describes a method for hashing lazy numbers, and shows that non trivial equality tests can be expensive. The system implemented only does very simple algebraic/symbolic equivilences, otherwise expesive tests are needed. (Probabilistic hashing is given as a possible solution)

Problems with Lazy Numbers include
  • lack of hardware support,
  • need for the programming lanaguage to allow a new number types, along with operators on those numbers.
  • conversion back to FP for output is problematic - polygons may self-intersect after processing. (But at least the algorithm will run, and at least the errors will be close to the size of FP accuracy)
  • numbers such as pi or e can still cause lots of problems

1 comment:

  1. Robust 3d boolean unions: http://portal.acm.org/citation.cfm?id=1735606&preflayout=flat#revs