Showing posts with label jts. Show all posts
Showing posts with label jts. Show all posts

Friday, March 28, 2008

JTS at the sharp end of many arrows

Miguel Montesinos and Jorge Sanz from the gvSIG project have made a nice diagram showing the relationships between a bunch of GFOSS projects. It' s nice to see JTS close to the centre of the diagram - although having so many arrows pointed at it makes me a little nervous!

They show JTS having a dependency on Batik for some reason - that must be an error, there's no relationship between the two. Or does Batik use JTS?

It would have been nice to see a "parent-of" relationship between JTS and GEOS and NTS, since the latter two also form a key component of quite a few projects. And I'm pretty sure MapGuide OS uses GEOS, as does OGR.

But it's not surprising there's a few errors and omissions - it's a pretty complicated web of relationships. And this is over only what - 8 years? I wonder what the diagram will look like in another 8...


Monday, March 24, 2008

Time for JSTS?

Like a lot of stuff, spatial functionality is moving out into the browser. Signs of the times:
  • proj4js
  • GeoJSON
  • This comment: "The DOJO client-side libraries provide support for performing some simple spatial operations like ‘intersection’ on the client-side." (Although I haven't been able to find this functionality in a quick browse of Dojo - can anyone confirm this?)
Seems like maybe the time is ripe for a port of JTS to JSTS - the Javascript Topology Suite.

Take it away, someone - I'm too busy!

Wednesday, December 12, 2007

New Buffer Styles in JTS 1.9

JTS allows specifying styling parameters to control the appearance of generated buffer polygons.

End Cap styles have been available for a while. End caps can be Round (the default), Flat, or Square. For Round caps, the Quadrant Segments parameter controls the degree of approximation to a true circular arc.

End Cap Styles: Round, Flat, Square

JTS 1.9 adds the ability to specify the Join Style. The Join style controls how the offset curves for consecutive line segments are joined together. The Join type can be Round (the default), Mitre, or Bevel. If Mitre is specified, a further Mitre Limit parameter is provided to control how long mitred joins can be before they are beveled off. The maximum length of a mitred corner is equal to the Mitre Limit multiplied by the buffer distance.

Join Styles: Mitre (limit = 10), Mitre (limit = 1), Bevel

The JTS TestBuilder has been enhanced to allow specifying all of these parameters, so it can be used to experiment with the visual effects of the various styles.

Thursday, November 15, 2007

Fast polygon merging in JTS using Cascaded Union

A common requirement in spatial processing is to union a set of polygons together. Depending on the use case, the polygons may either be a coverage or be overlapping. Up until recently, there were two techniques for accomplishing this in JTS:
  • Iterated Union: Iterate over the polygons and union them one-by-one into a result polygon
  • Buffer Union: Collect the polygons into a GeometryCollection and run buffer(0) on the collection
The first one is straightforward, but quite inefficient for large input sets. The second technique is more efficient in many cases, but still tends to fail for large or complex data.

Since this is such a common operation, I've been keen to find a faster and more robust approach. An alternative strategy is to use a concept I call Cascaded Union. The idea is to union small subsets of the input together, then union groups of the resulting polygons, and so on recursively until the final union of all areas is computed. This can be thought of as a post-order traversal of a tree, where the union is performed at each interior node. If the tree structure is determined by the spatial proximity of the input polygons, and there is overlap or adjacency in the input dataset, this algorithm can be quite efficient. This is because union operations higher in the tree are faster, since linework is "merged away" from the lower levels.

This picture shows the algorithm in action:


Handily, the spatial indexes already in JTS can be used to determine an appropriate tree structure. Currently the Sort-Tile-Recursive packed R-Tree index is used - this seems to produce an effective tree (although I suspect that the algorithm is not very senstive to the precise spatial tree used)

It turns out that Cascaded Union can be much more efficient than Iterated Union. On a test dataset with 30,000 small polygons with a high degree of overlap, the performance results were:

Cascaded Union: 20 sec
Iterated Union: 3 hours 40 min !

Not bad - well worth the effort of coding... Also, it's more robust than Buffer Union, and often much faster as well.

This will be provided via the Geometry.union() (Unary union) method in JTS 1.9.

Wednesday, November 14, 2007

Implementations of Polygon Overlay algorithms

Occasionally people ask about the algorithms used in JTS's overlay operations. I don't have a good answer for this, since I basically made it up (in an informed kind of way), and haven't had time to document it to the level I'd like (thank goodness for self-documenting code... 8^)

I recently came across this web site from back in 1998, which is happily still around. It has a good list of implementations of what it calls Polygon Boolean operations. (This is the same thing as the JTS overlay operations - I chose to use a different term to avoid confusion with the spatial predicates). Perhaps some of these sites have done a better job of documentation! There's also some useful references to academic papers covering aspects of the issue.

Wednesday, August 8, 2007

PreparedGeometry - efficient batch spatial operations for JTS

A common usage pattern for JTS spatial operations (in particular, predicates such as intersects and contains, and overlay operations such as intersection and union) involves the operation being invoked between a single target geometry and a batch of test geometries. For instance, this occurs in queries in a spatial database such as PostGIS, where each geometry in the driving table interacts with a set of nearby geometries via the required operation. By taking advantage of the fact that the target geometry does not change, it is possible to significantly reduce the overall execution cost of the operation. Information such as an index on the line segments in the target geometry can be computed once and cached for reuse with all of the test geometries.

In order for this to work, the calling code must be able to indicate that the operation is being invoked in a batch setting. The standard JTS Geometry API doesn't allow differentiating this case, so the new concept of PreparedGeometry has been developed. (This name intentionally echoes the concept of prepared statement in SQL API's, which implements the same strategy.) Creating a PreparedGeometry on a target Geometry enables precomputation of various data structures, which then allows subsequent operations to be evaluated with maximum efficiency.

In addition to caching, it is also possible to optimize the evaluation of spatial predicates. Optimization takes the form of short-circuiting predicate evaluation when certain situations are detected. While not all OGC spatial predicates are amenable to optimization, fortunately the most commonly usd predicates, intersects and contains, can be optimized significantly. Examples of situations which can be optimized are:
  • For intersects, any intersection between line segments indicates a true result. If there is no intersection between any line segments of the operands, and the target geometry is polygonal, then one or more fast point-in-polygon tests can be performed to compute the result value.
  • Since it provides more information than intersects, the contains predicate is more complex to optimize. In the case where the target geometry is polygonal, evaluation can be short-circuited when the test geometry has no line segment intersection. In this case, containment can be tested by an appropriate set of point-in-polygon tests. It is also possible to short-circuit in some specific cases where a proper intersection is detected between two line segments. Both of these situations can be tested efficiently.
The performance gains from using PreparedGeometry in real-world situations can be dramatic - up to 40 times improvement in speed. This functionality will appear in the next release of JTS, and hopefully will make its way into GEOS and PostGIS as well.

Thursday, July 12, 2007

The ultimate Point-In-Polygon implementation

The classic stabbing line/ray-crossing point-in-polygon algorithm has been around for longer than computational geometry itself. (See William Franklin and SoftSurfer for background and description. ) The algorithm as usually published is short, effective and clever. But it does have some limitations:
  1. It is implemented as an iteration over a data structure representing a sequence of line segments. This ties the algorithm to a fixed data structure, which leads to code duplication
  2. It handles simple rings, but not fully general polygons (which may contain holes and/or multiple shells)
  3. It does not detect the case where the query point lies on one of the segments in the polygon
  4. It runs in O(n) time. This is reasonable for a single query point, but is not the most efficient way to perform many point queries against a single fixed polygon
  5. It uses conventional floating-point arithmetic. This makes it non-robust (i.e. potentially incorrect for some inputs)
This predicate forms a key component of many spatial processing algorithms. In JTS it's used all over the place. The more places it's used, the more the above limitations have become apparent. Over time I've been slowly chipping away at removing them.

Most recently, I've revisited P-I-P in the course of some work on improving the performance of spatial predicates. And I think I've finally arrived at the ultimate Point-In-Polygon implementation. It removes all of the above limitations with the following design features:
  1. It uses an incremental design, which frees the algorithm from being tied to any particular segment data structure
  2. It handles all polygonal geometry types defined in JTS. (Implementing this was made much cleaner by the pleasant realization that the PIP algorithm really doesn't care where in a polygonal geometry the segments come from. The only thing that matters is the parity of the segment crossings. This holds true for any number of holes and shells.)
  3. It detcts the point-on-segment case, and reports it via a trivalent return value (in the set {INTERIOR, BOUNDARY, EXTERIOR}).
  4. It is straightforward to use with a spatial index on the segments of the polygon (such as STRtree or BinTree), thus providing O(log n) performance
  5. Thanks to the RobustDeterminant class in JTS, the implementation is much more robust. (Due to some remaining floating-point arithmetic it may still not be 100% robust - but I hope to address that sometime as well.)
Even with all these changes, it's not much more complex than the original algorithm (at least, not given various supporting components which are already available in JTS.)

Coming soon to a JTS version near you...

Saturday, June 16, 2007

Presentations at FOSS4G 2007

I'm giving two talks at FOSS4G 2007, which is taking place here in Victoria in September. The abstracts are:

The JTS Topology Suite: Tools, Tips & Techniques

The JTS Topology Suite is one of the most widely used geometry libraries for Java. This talk will review the standard geometry methods in JTS, with an emphasis on their finer details. Some of the additional algorithms and components which the library provides will be discussed. Tips for improving performance and techniques for accomplishing various kinds of geometric processing tasks will be presented. The talk will conclude with an preview of some potential future developments for JTS.

Automatic watershed delineation using open source Java

The B.C. Corporate Watershed Base is a large-scale database of hydrographic information built using open source products. This talk discusses the development of an open source system for automated delineation of watershed boundaries based on CWB hydrography and a terrain model.


If previous years are anything to go by, this conference should be very informative and energizing. See you there!

Sunday, June 10, 2007

History of JTS and GEOS

It seems a good thing to record the history and provenance of JTS and GEOS, for the following reasons:
  • JTS and GEOS have attained a certain prominence in the open-source geospatial world. Sometimes comments are made which make assumptions about their goals and project structure which indicate a lack of knowledge about the true situation. Time to set the record straight!
  • The legal attribution of code provenance bears little relationship to the true origin of the intellectual property incorporated in these projects. It seems only fair that the people who actually contributed to these projects have their participation recorded.
  • As the designer/lead developer for these projects, I have the most knowledge about their origin and history.

JTS

JTS is the result of the fortuitous convergence of two situations. The first was that I had been working on spatial algorithms since the mid-90's. At that time I held a position in the BC Ministry of Forests, with a major focus being geomatics and spatial data management. While there I developed algorithms in C++ for polygonization and the beginnings of a polygon overlay algorithm. These were based on a simple spatial data model and included implementations of several fundamental computational geometric functions. A colleague, Brian Howden, was very encouraging in this work. Brian and I both felt that what the world really needed was a good spatial library, and this motivated me to start packaging my code as a reusable library. However, progress was slow and I did not feel like I had a clear idea of what such a library should look like. Moreover, library development in C++ was quite painful, since it requires such painstaking attention to details of memory management and code structure.

The second situation was my connection with Dr. Mark Sondheim, of the Geographic Data BC (GDBC) branch of the BC Ministry of Environment, Lands and Parks. I had known Mark since the early 90's, and had interacted with him on numerous occasions while I was working for the BC Government. Over the years we had many stimulating discussions about geospatial processing, and shared a similar interest in increasing the accessibility and formality of spatial data processing.

The JTS Topology Suite formally started life as a project conceived and initiated by Mark. The name and acronym were his idea, as was the concept of using the OGC Simple Features Specification as the basis for the API design. This latter idea was a key choice, since the SFS strikes a good balance between functionality and design complexity. Mark was confident that the SFS geometry model provided everything needed for most geospatial work, and time has proven him correct.

By 2000 I had moved to Vivid Solutions Inc. as a consultant, working primarily in Java and as much as possible on geospatial projects. While there I worked on a couple of spatial projects for GDBC which were Java based (most notably, a streaming parser for SAIF, a forerunner of GML developed by Mark Sondheim and others). I was also working on various projects which utilized my C++ spatial library (including some Java-based ones, necessitating the awkward and limiting use of JNI). Howver, there was never sufficient time or budget to contemplate re-implementing the C++ geometry library in Java (and, I have to admit, perhaps not the vision on my part that this would be as useful as it has turned out. In my defense, at that time Java was still proving itself as an efficient tool for development work).

Luckily, Mark did have the vision, and the access to funding. As a result, the JTS project was initiated in Fall 2000. The project goal was very clear: to develop a Java API which implemented the OGC Simple Features Specification. Given my experience and previous work, I suspect it was an easy choice to award the contract to Vivid Solutions. The project team consisted of myself as designer and lead developer, and Jonathan Aquino as a developer.

The project charter did not highlight the requirement for efficiency and robustness, since Mark wisely realized that these goals would have to be proven to be attainable during the course of the project. In hindsight the need for efficient performance should have been obvious; the need for robustness perhaps less so. (JTS is not exceptional in this area - a disappointly large percentage of published computational geometry algorithms fail to address this issue.) In any case, once the basic functionality was achieved, these two aspects emerged as much more important. Even today, they are still the most important issues in JTS development.

David Skea (at that time also with GDBC) contributed valuable direction and guidance. Most notably, it was his recommendation that an explicit precision model be provided, and he provided much insight about implementing robust spatial algorithms.

At the outset of the project it seemed that the main challenges were going to be designing the algorithms for spatial relationships and polygon overlay. Buffering was lurking in the background with an unknown level of difficulty. Mike Butler (the "father of SDE") had warned me that buffering would be a significant challenge, but since at that time he was still under NDA to ESRI, he did not provide me with any implementation advice. Later in the project when I tackled buffering I found out how right he was!

Development proceeded fairly quickly, with myself tackling API and algorithm design, and Jon working on I/O and implementations of structural methods. Jon also worked on the Unit Test facilities. David Skea contributed the core of the line segment intersection implementation, including the key development of deVillier's robust 2x2 determinant algorithm. Yao Cui of GDBC did good work on categorizing and defining test cases for the spatial predicates (which now appear as part of the JTS validation tests, a useful body of work in its own right).

In the middle of the project I realized that spatial visualization would be essential for designing and debugging spatial algorithms, and I also realized that Java2D and Swing would provide a great platform to build a visualization tool. This led to the development on the TestBuilder as an important component in the JTS toolkit. Jon and I shared development of the TestBuilder. This experience was very valuable in our subsequent development of JUMP.

JTS 1.0 was released in February 2002. The scheduling of this release was primarily motivated by contractual requirements. As soon as it was released I continued work on validation and buffer improvements. This resulted in the release of Version 1.1 in March 2002. At this point the contract with GDBC came to an end. I was encouraged by the success of the JTS development, and I made a personal commitment to continue to enhance JTS with better and more algorithms.

Subsequent releases of JTS have been motivated by my desire to see the library grow, by bug reports and suggestions from clients and users, and occasionally by directly-funded work (notably, work I carried out in 2006 on improving the robustness of polygon overlay). JTS has moved beyond the basic SFS by providing functionality such as polygonization, simplification, linear referencing, affine transformations, and a wide variety of structural functions. One day perhaps it will even provide a true topology API !

It's fair to say that JTS has been wildly successful in fulfilling the goal of providing a full-featured, robust, efficient library of spatial operations. It is being used in numerous spatial applications, notably:

  • JUMP (and derived projects)
  • GeoTools (and derived projects),
  • the BC Goverment/Moxie Media Internet Mapping Framework
There are many, many other applications and projects using JTS as well.

GEOS

In 2003 PostGIS was emerging as a serious and useful spatial database. However, there was one thing it sorely lacked - a complete set of spatial functions. Paul Ramsey and I strategized that the functions in JTS were an excellent basis for filling this gap. The big catch was that PostGIS required a pure C implementation. Never being one to shy away from thinking big, Paul proposed that we port JTS to C++. At that time Dave Blasby was still involved with PostGIS development, and it was he who came up with the name of Geometry Engine (Open Source) - GEOS.

Initially the GEOS project was funded jointly by Refractions Research Inc. and Vivid Solutions, with myself as Technical Advisor (and to a limited extent Architect), and a University of Victoria Master's student named Yury Bychov as developer and designer. Since I was busy on other projects, and in any case would not claim to be an expert in the byzantine area of C++ code design, the plan was that Yury would provide any extra design needed to accomodate differences between Java and C++, and I would advise as much as possible on the purely functional design of the API.

We knew that C++'s lacks of automated memory management would be a major challenge, and I also expected that its much more flexible (some would say baroque) features for code organization (e.g. templates, namespaces) would require some careful thought. I had no idea that the porting process would be quite so painful and prolonged, however. In the end it took longer to port JTS to C++ than it did to develop the entirety of JTS up to Version 1.1! [This is the main reason why my strategy for designing and implementing spatial algorithms remains focussed on Java as the development language, with the GEOS port happening only after the algorithms are relatively stable and well-understood. In my opinion C++ is less effective for algorithm design since it requires too much brainpower devoted to irrelevant issues.]

Since its initial release GEOS has been incorporated in PostGIS, of course, and also in MapGuide and apparently in at least a few other C-based projects. GEOS code has been incorporated by Safe Software in their FME product. MapServer has announced plans to incorporate GEOS in their codebase (the status of this is unknown to me). Frank Warmerdam's OGR project I believe uses GEOS.

GEOS also continues to undergo development, both in the orginal mode of direct ports of JTS code, and also to a certain extent with independent functionality (primarily involving making it better adapted to C++ programming styles).

Martin Davis, May 2007

Monotone Chains: the unknown technique

When I was originally developing JTS, I spent quite of bit of time thinking about and researching techniques for efficiently computing arrangements (intersections) of sets of linestrings. There's a relatively large body of literature on this subject, as you might expect since it is such a key aspect of many geometric algorithms. As usual, however, much of this literature is fairly academic in nature, and usually doesn't address some of the more mundane issues which an implementor has to deal with.

One issue that crops up with intersecting sets of linestrings is that there is a tension between increasing performance and decreasing memory use. The fastest way should be to indexing individual line segments. However, this implies creating a new memory structure for each line segment, which can be both time and memory intensive. The most memory-effective structure is of course to leave the original linestrings undivided, but this does not provide good performance.

It seemed to me that there must be an intermediate between these two extremes. After some thought I came up with the idea that I called "monotone chains". A monotone chain is a string of line segments which are monotonic in both X and Y directions. This structure has some nice properties:
  • Monotone chains cannot self-intersect
  • Two monotone chains intersect in at most one connected set of line segments.
  • The envelopes of any two subsequences of a monotone chain are interior-disjoint. This means that the envelopes of successive bisections of the chain form a "pyramid" of envelopes. In turn, this allows comparing two chains for intersection can be carried out using a binary search technique using the two envelope pyramids.
In addition, splitting linear geometry into monotone chains is a good heuristic for partitioning the geometry. Often digitized linework representing natural features contains quite long sequences of segments which are monotone (since the linework follows natural curved features.

I was pretty pleased with my "discovery", and used it to good effect in JTS. For quite a while I was unaware of any prior use of this technique, but recently I came across a reference to a paper by Warren Burton in 1977 which apparently discusses this technique. I was led to this by the paper on Whirlpool by Nick Chrisman et al., which used this technique under the name "monotone sections".

It doesn't come as too much of a surprise that such a simple and useful technique has been used before, but what I do find a bit surprising is how little this technique is discussed in computational geometry literature. I presume the reason for this is that the technique is too simple to be of continuing interest to academics, while it is too special-purpose to be worth presenting to students. There's a dearth of computational geometry literature directed at the true practitioner - which probably indicates how few practitioners there are!

References

Burton, Warren 1977: Representation of many-sided polygons and polygonal lines for rapid processing, Communications ACM, vol. 20, no.3.

Chrisman, N.R., Dougenik, J.A. and White, D. 1992: Lessons for the design of polygon overlay processing from the Odyssey Whirlpool algorithm, Proc. 5th Int. Symp. on Spatial Data Handling, 2, 401-410.

Saturday, June 9, 2007

Quirks of the "Contains" Spatial Predicate

One of the most useful OGC standards is their specification of the Dimensionally-Extended 9 Intersection Model for spatial relationships. (To give credit where is it due, this was originally developed by Egenhofer, Clementini and others). In its most general form this model is quite complex, so to make it usable for mortal programmers, a set of commonly-useful "named predicates" have been specified. These include obvious relationships such as intersects, disjoint, touches, equals, and contains.

But are they really so obvious? In fact not - several of them have subtle aspects to their definition which are contrary to intuition.

In particular, "contains" (and its converse "within") has an aspect of its definition which may produce unexpected behaviour. This quirk can be expressed as "Polygons do not contain their boundary". More precisely, the definition of contains is:

Geometry A contains Geometry B iff no points of B lie in the exterior of A, and at least one point of the interior of B lies in the interior of A

That last clause causes the trap - because of it, a LineString which is completely contained in the boundary of a Polygon is not considered to be contained in that Polygon!

This behaviour could easily trip up someone who is simply trying to find all LineStrings which have no points outside a given Polygon. In fact, this is probably the most common usage of contains. For this reason it's useful to define another predicate called covers, which has the intuitively expected semantics:

Geometry A covers Geometry B iff no points of B lie in the exterior of A

Its converse coveredBy is also useful as well. It's a bit of a mystery why OGC did not define this predicate. In any case, Oracle Spatial does provide this predicate. I have added it to JTS as well.

There's a further bonus to defining the covers predicate. It is much easier to optimize the common use case:

covers(Rectangle, Geometry)

All that is necessary to determine this condition is to perform a simple bounding box comparison. This is not possible with contains, because even if the bounding box of Geometry is covered by the Rectangle, a further expensive operation is required to test if the Geometry lies wholly in the boundary of the Rectangle (in which case the predicate fails).

Covers "simplifies" the defintion of contains by making it more general (inclusive). There's another possibility as well, which is to simplify in the direction of being less inclusive. This would produce a predicate which might be called containsProperly, with the definition:

Geometry A containsProperly Geometry B iff all points of B lie in the interior of A

Interestingly, some recent work I'm doing indicates that this predicate might play a very useful role - but that's a story for another post.


References

DE-9IM: Clementini, Eliseo, Di Felice, and van Osstrom; "A Small Set of Formal Topological Relationships Suitable for End-User Interaction," D. Abel and B.C. Ooi (Ed.), Advances in Spatial Database—Third International Symposium. SSD '93. LNCS 692. Pp. 277-295.

9 Intersection model: M. J. Egenhofer and J. Herring; "Categorizing binary topological relationships between regions, lines, and points in geographic databases," Tech. Report, Department of Surveying Engineering, University of Maine, Orono, ME 1991.