Monday, September 10, 2012

The complexity of simple

Recently Sean Gillies and others have been looking at the concept of geometric simplicity as implemented in JTS/GEOS, specifically in the case of polygons.  The TL;DR is that JTS has a bug because it reports the following polygon as simple:

POLYGON ((39.14513303 23.75100816, 97.05372209 40.54550088, 105.26527014 48.13302213, 100.91752685 58.43336815, 71.56081448 83.55114801, 60.71189168 86.25316099, 62.00469808 75.1478176, 83.16310007 42.82071673, 92.82305862 37.19175582, 95.99401129 26.47051246, 106.22054482 15.51975192, 39.14513303 23.75100816))

This polygon is invalid, because the shell ring self-intersects.  This means the ring is non-simple - but it's not so clear-cut for the polygon itself.

In fact, reporting the polygon as simple isn't a bug - it was a deliberate design decision.  The original rationale was:

  • the OGC Simple Features Specification in section 2.1.10 defining polygon geometries states that "Polygons are simple geometries".  (This is consistent with the requirement that valid polygons contain no self-intersections).  
  • JTS usually follows the principle that computed results are only well-defined when the input geometries are valid.  This is because it can be expensive to check for validity in order to handle invalid inputs gracefully.  Validity testing is factored out into a separate method (isValid) in order to allow the client to decide when to incur the code of executing it.
For these reasons the initial JTS implementation of isSimple returned true for all polygonal inputs, thus avoiding the cost of checking for self-intersections.

However, perhaps it would be more useful to actually carry out some testing of simplicity.  This raises the question of what exactly are the semantics of isSimple for polygons?  Looking to the SFS for guidance, it has the following somewhat imprecise general definition of simplicity:
the Geometry has no anomalous geometric points, such as self intersection or self tangency. The description of each instantiable geometric class will include the specific conditions that cause an instance of that class to be classified as not simple.
For polygons the spec simply states:
Polygons are simple geometries
(In contrast, for linear geometry the specification has a rigorous mathematical definition of simplicity.)

Apart from the "always simple" semantics, there are at least two other options:
  1. Polygons are simple if their component rings are simple (i.e. have no self-intersections)
  2. Polygons are simple if they have no topological anomalies.  This is equivalent to checking whether the polygon is valid.
Because option #2 simply replicates the semantics of isValid, it seems more useful to adopt option #1 to provide more possibilities for analyzing polygon topology (but I'm definitely open to suggestions about why #2 would be better!)

This change to the semantics of isSimple has been implemented in JTS and will appear in the 1.13 release.  The code will also be extended to handle GeometryCollections, with the semantics that they are simple if all their components are simple.


sgillies said...

Martin, it seems to me that simplicity might be more inherent in polygons than in linear geometries. We can write WKT representations of valid, non-simple lines, but can't write such representations of valid, non-simple surfaces.

Dr JTS said...

Glad you read this, Sean! I left a reply to this on your blog.

I'd be interested to hear exactly what the use case is for computing simplicity of polygons - as opposed to determining validity.

Kevin Wiebe said...

I also recall us wrestling with the issue on whether EMPTY LinearRings were closed or not, where OGC is silent on the issue. This affected whether EMPTY LinearRings would be considered Valid/Simple.

This change you have implemented - does it affect how EMPTY would be handled as well?

isValid(POLYGON((1 1,5 1,5 5,1 5,1 1),EMPTY,(2 2,2 3,3 3,3 2,2 2)))
isSimple(POLYGON((1 1,5 1,5 5,1 5,1 1),EMPTY,(2 2,2 3,3 3,3 2,2 2)))

I certainly value your leadership here.

My concern is not so much what the "right" answer is to the above, but rather consistency. I have come to the belief that several areas of the OGC spec are silent on issues like this. There are different interpretations of the specs, all of which are consistent. However, it is best for the community if we are consistent between our separate implementations. (If we are not, we will run into trouble when we use one set of tools to filter/fix data and then load data using another set of tools which then complain.)

I encourage all implementers to take time, as you are Martin, and make their software's behaviour in cases like these intentionally and rationally.

... alternatively, we'll all just nod and ride on your coattails. :)

Dr JTS said...

Kevin, that's a very good point - I hadn't been thinking about the issue of how simplicity applies to empty geometry, but of course it needs to be defined.

For isSimple, my take is that empty geometries are simple - since they obviously don't contain any "anomalous points". In general, I think it's best to report geometries as simple unless there is a clear reason to do otherwise.

The same goes for validity.

Closed is trickier. I seem to recall the decision was that empty geometries are not closed (which is what's currently implemented in JTS).

And I definitely agree - it would be nice if spatial software can agree on a consistent model, whether that is de jure or de facto.

Kevin Wiebe said...

More food for thought:

One argument I came across in a newsgroup was that in some cases (like the results of some processing) empty LinearRings were expected as reasonable output, and therefore should be Valid. And because all Valid LinearRings must by definition already be Closed, one can conclude that empty LinearRings must be closed.

Dr JTS said...

I agree with that reasoning, and actually I just checked a bit more carefully, and that's what JTS does now.

So the existing logic is:

- POLYGON EMPTY is closed
- POINT EMPTY is closed (as are all points)

sgillies said...

Well, of course I read it, Martin. I remain a devoted subscriber to your blog, and to blogging in general :)

I also need to better understand my user's case for simplicity. It's not a test I ever make of polygons.