Thursday, 6 May 2010

Random Points in Polygon in JTS

Recently there was a thread on the PostGIS list about how to create "dot-density" maps. Essentially this involves creating a set of N randomly-placed points which lie within a given polygon.

JTS already has a lot of random shape creation functions provided in the TestBuilder (randomPoints, randomPointsInGrid, randomRadialPoints, randomRectilinearWalk, etc). But they aren't currently exposed as a class in the API. This seemed like a good use case to initiate the development of such a class.

So now JTS has a RandomShapeFactory class. The class allows setting an extent using either a rectangular Envelope or a polygonal geometry, and will create sets of N points within the defined extent.

Here's a screenshot of the TestBuilder showing the new function at work:


Update: it's been pointed out that what is really desired is to have the points evenly distributed through the polygon. This will take a bit more thinking... At least the API won't have to change much to support this.

Monday, 26 April 2010

Late night link roundup

Here's a few interesting links that occupied my late-night browsing...

Facebook's Graph API: The Future Of Semantic Web?

This is interesting for two reasons. One is that this is a hardball play by Facebook to subvert many other sites whose business model is linking social networking to categories of cultural artifacts. The other is the implications for the Semantic Web. I'm not so sure that the latter is quite so easily accomplished, but perhaps the 80-20 rule will truly turn out to be key here.

Mahout 0.3: Open Source Machine Learning

Neat stuff. Pulls together some fascinating technologies like Hadoop and clustering techniques. Perhaps the best thing about this is the chance to learn more about how this stuff actually works in practice. It would be great if there's a spatial component to this.

Colt - a set of Open Source Libraries for High Performance Scientific and Technical Computing in Java. One more nail in the old "Java is too slow" canard.

Wednesday, 7 April 2010

Twitter heart JTS

At least, that's what it looks like from their presentation on Handling Real-Time Datastreams at the recent Where 2.0.

They note that JTS does not have support for GeoRSS or GeoJSON. Actually there is a GeoJSON implementation sitting in the labs - but it really needs some funding to get it finished off (hint, hint, Twitter - what, you think this groovy open-source spatial stuff just codes itself?)

And as usual people have not noticed that the real source for JTS information is on the new home page and the Sourceforge site - not this stale web page. Sigh... if only URLs had expiry dates.

Wednesday, 10 March 2010

Tuesday, 9 March 2010

More Open-Source Geocoders

To continue my previous post on open-source geocoders, here's a few more geocoding projects we've reviewed here at Refractions:

  • PAGC Postal Address GeoCoder ( ) is "a library and a CGI based web service written in ANSI C that uses an address-ranged street network shapefile". It uses a rule-based parser based on the Aho-Corasick string searching algorithm. The parser rules are user-configurable, which is nice (although the rule format is NON-user-friendly, consisting of opaque lists of integers!). Exact match, Soundex and Edit distance are used in the matching phase. Supported reference road networks include both the TIGER and the StatsCan networks. BerkeleyDB is used as the reference network data store.
  • The USC WebGIS Geocoder provides a free, size-limited geocoding service. It claims to be open source, however links to the source code are not obviously provided. It is documented as using a "rule-based parser", but it's not clear how a user could actually customize this and run their own instance. Matching uses attribute relaxation, substring matching, and Soundex. The reference dataset appears to be TIGER, stored in a MS SQLServer database.
  • The FEBRL Geocoder is a well-researched, well-documented system implemented in Python. It targets Australian road network data. It specifically does not attempt to work with North American data (but suggests that the address models are close enough that this would be possible.) The address parser is unique in using a trainable Hidden Markov Model, and also in being documented by a series of academic papers (e.g. [1] ) describing the approach in detail. An address cleaning module is supplied. Matching uses exact or "approximate matching".
  • The OpenGeocoder initiative appears to be a worthy attempt to create a geocoder under the auspices of OpenGeo (possibly as a port of PAGC?). However, this project has not had much recent activity, and doesn't appear to provide any actual code.
One salient aspect of these systems is that they provide address parsing algorithms which are based on well-understood parsing theory. This is of particular interest for our geocoder project - of which more later.


References

[1] A probabilistic geocoding system utilising a parcel based address file; CHRISTEN Peter, WILLMORE Alan, CHURCHES Tim; Data mining : ( theory, methodology, techniques, and applications ), 2006

Saturday, 6 March 2010

Open Source Geocoders

One of the more interesting projects we have going on here at Refractions is to build a geocoder for use in a crime-mapping application we are developing for a client. We do have an existing geocoder codebase developed for another project. But we're not 100% happy with its performance and customizability, so we decided to look into developing a new library specifically for this project.

Of course the first thing we did was carry out a technology review of all the open-source geocoders we could find. Here's a list of all the ones we looked at:
  • Geo-Coder-US - A Perl module developed by the ubiquitous Schuyler Erle. "For geocoding US addresses, that is, estimating the latitude and longitude of any street address or intersection in the United States, using the TIGER/Line data set". Probably no longer being developed, since it has been superseded by
  • GeoCommons Geocoder::US - a rewrite of Geo-Coder-US into Ruby (and also requiring C and SQLite). "Although it is primarily intended for use with the US Census Bureau’s free TIGER/Line dataset, it uses an abstract US address data model that can be employed with other sources of US street address range data"
  • JGeoCoder - A Java API loosely modelled after Geo::Coder::US. Works against a SQL database loaded with TIGER data (an H2 image is supplied). Last activity in 2008.
  • Explorer GeoCoder by SRC - A C++ library for "a data and country independent geocoding engine" which can "assign latitude and longitude coordinates to any United States street address or intersection". Has an active mailing list.
  • Frost Tiger Geocoder by Stephen Frost et al - a Postgres SQL library for geocoding against TIGER data
(This is summarized in tabular form on this Tsusiat page).

Some observations:
  • All of the engines implement parsing and matching logic purely in code. None of them provide a declarative description language to allow easy modification of parsing, standardization, and matching rules. (To be fair, this is bit of a tall order. And it's not clear that it's even possible to provide an understandable declarative language for the fully general case. For example, the ArcMap geocoder (which appears to be the old MatchWare engine) provides a geocoding definition language (actually 5 different ones) - but the languages look scarily complex! Nonetheless, this is an important feature for easy of maintenance and customization.)
  • JGeoCoder uses a large number of complex regular expressions to perform parsing. This looks like it would be difficult to customize, due to the well-known opaqueness of large REs, and perhaps also to the relative inflexibility of the RE paradigm
  • The GeoCoder::US Ruby module seems to be the simplest code base. (I ended up almost understanding its parsing algorithm 8^) It uses REs, but in a saner amount. However, it's unclear how well it deals with erroneous input data, and how easy it would be to modify for a different address model.
  • The Explorer geocoder uses a large amount of fairly complex C++ code. It also looked quite challenging to understand and modify.
  • In all the projects the parser design appears to be fairly ad-hoc and poorly documented. This situation doesn't inspire confidence that it would be possible to modify the parser to support a different address model, or to handle particular kinds of input errors. (GeoCoder::US is a possible exception to this - it has a relatively simple parsing algorithm with at least some documentation).
In the end we decided not to use any of these projects. I'll talk about what we did do in another post.

Monday, 1 March 2010

JTS Version 1.11 released!

JTS Version 1.11 is now available for download from SourceForge.


The version contains numerous enhancements, including

  • Delaunay triangulation and Voronoi diagrams
  • AWT Shape reading and writing
  • Geometry similarity metrics
  • support for Geometry densification
  • Numerous improvements to the JTS TestBuilder

Full release notes:

Functionality Improvements

  • Added CoordinateArrays.isRing
  • Added CGAlgorithms.signedArea(CoordinateSequence)
  • Added CoordinateArrays.copyDeep(...) method to copy sections of arrays
  • Added CoordinateList.add(Coordinate[], boolean, int, int) method to add sections of arrays
  • Added LineSegment.toGeometry(), LineSegment.lineIntersection()()
  • Added LineSegment.hashCode()
  • Added geometric similarity classes (HausdorffSimilarityMeasure, AreaSimilarityMeasure)
  • Added MinimumDiameter.getMinimumRectangle()
  • Added MinimumBoundingCircle class
  • Added Densifier class
  • Added triangulation API, including QuadEdgeSubdivision, IncrementalDelaunayTriangulator, ConformingDelaunayTriangulator and supporting classes
  • Added VoronoiDiagramBuilder to perform Voronoi diagram generation
  • Added scaleInstance(scaleX, scaleY, x, y) to AffineTransformation
  • Added AffineTransformationFactory to allow generating transformations from various kinds of control inputs
  • Added BinTree.remove() method
  • Fixed BinTree.query() to allow null interval arguments
  • Added ShapeReader API to convert Java2D Shapes into JTS Geometry
  • Added ShapeWriter API to convert JTS geometry into Java2D Shapes
  • Added FontGlyphReader API to render Java2D text font glyphs into geometry
  • Added SdeReader to jtsio library
  • Added Debug break methods
  • Added Memory utility for reporting memory statistics
  • Added ObjectCounter utility for counting objects
  • Added createSquircle and createSuperCircle to GeometricShapeFactory

Performance Improvements

  • Improved performance of Geometry.getArea() and Geometry.getLength() when used with custom CoordinateSequences

API Changes

  • Deprecated WKBWriter.bytesToHex in favour of WKBWriter.toHexto regularize and simplify method name

Bug Fixes

  • Fixed Point.isValid() to check for invalid coordinates (ie with Nan ordinates)
  • Fixed Geometry.distance() and DistanceOp to return 0.0 for empty inputs
  • Fixed Buffer to handle degenerate polygons with too few distinct points correctly
  • Added illegal state check in LineSegment.pointAlongOffset()
  • Fixed exception strategy in BufferSubgraph to handle certain robustness failures correctly
  • Fixed robustness problem in OffsetCurveBuilder in computing mitred joins for nearly parallel segments
  • Fixed minor bug in BufferInputLineSimplifier which prevented simplification of some situations
  • Fixed bug in BufferInputLineSimplifier which caused over-simplification for large tolerances
  • Fixed bug in Angle.normalizePositive to handle values > 2PI correctly
  • Fixed WKTWriter to emit correct syntax for MULTIPOINTs
  • Fixed WKTReader to accept correct syntax for MULTIPOINTs
  • CGAlgorithms.isCCW now checks for too few points in the ring and throws an IllegalArgumentException
  • Fixed bug in AffineTransformation#eqals (logic bug)
  • Fixed bug in CoordinateList#closeRing (cloning closing Coordinate)

JTS TestBuilder

Functionality Improvements

  • WKT input is cleaned automatically when loaded (illegal chars are removed)
  • Added WKT-Formatted option to Test Case View dialog
  • Many new geometry functions added
  • Geometry functions are displayed in tree
  • Geometry functions can be implemented as Java static class methods.
  • Geometry function classes can be loaded dynamically from command-line
  • Improved handling of very large geometry inputs and results
  • Threaded rendering allows display of very large geometries without limiting usability
  • Added Draw Rectangle tool
  • Added Drag-n-drop loading of .SHP files
  • Added Info tool to provide persistent display of geometry point/segment information
  • Added display of memory usage

Wednesday, 17 February 2010

JTS and Google App Engine

Nice to see that JTS is officially certified as compatible with Google App Engine.

JTS makes clouds easy!

A few people have tinkered with running JTS in GAE (eg here and here). But the killer app has yet to emerge... How about a Java-based App Engine version of this?

JTS & Jython

Sean's post suddenly made me realize that it might be really cool to have a nice set of Jython bindings for JTS (a la what Shapely has done for GEOS).

Now, of course Jython can call Java classes directly, so maybe this is an empty task. But perhaps there are some impedance mismatches which could be removed via some simple glue code. And it would be nice to provide some examples of Jython code using JTS, to make it easier for people to figure out how to get started.

And of course I'd throw in Proj4J as well!

One more thing for my ever-growing task list...

If anyone has any ideas on what needs doing here, I'd like to hear them.

(This also makes me think that both Jython and JTS need some cool logos... I don't have any pictures to post!)

Friday, 5 February 2010

Opposites attract?

This sounds like

Fine Swiss shade-grown organic chocolate and rancid peanut butter from the big-box store discount aisle

HatBox for Derby and H2

I just saw the HatBox spatial extension to Derby and H2. (Cute name - Derby, HatBOX - get it?)


And of course HatBox uses JTS!

This is great - H2 is a fantastic pure Java database, which has been crying out for spatial support. (Perhaps if this extension proves its worth it will be merged into the main H2 codebase at some point?)

However, HatBox is still a "user-space extension" - which means that it has not enhanced the SQL evaluation engine itself with knowledge about spatial indexes and when to use them. So to utilize spatial indexing you have to explicitly join to the spatial index table, which results in ugly SQL like this:

select ID, GEOM from T1 as t
inner join
HATBOX_MBR_INTERSECTS_ENV('PUBLIC','T1',145.05,145.25,-37.25,-37.05) as i
on t.ID = i.HATBOX_JOIN_ID

This is the same approach used in Sqlite and ESRI SDE and other spatial extensions which operate in user space rather than DB engine space (Mike Butler of SDBE fame used to call this "staying above the blood-brain barrier" 8^). Basically you are adding the index filter condition which for scalar indexes the SQL engine adds automatically.

In contrast, the "big boys" like PostGIS, Oracle, SQL Server, DB2, Informix, etc have actually extended their database engine to handle spatial datatypes and indexes. Most of these systems actually have provided a general extensibility mechanism which allows a clean separation between the engine core and the new datatypes. PostgreSQL is probably the one which takes this to the ultimate extent.

User-space spatial extensions are for a first approach, but it would be really nice to be able to play with the big boys and incorporate knowledge of spatial indexes and functions directly into the database engine. This should be easierto do in Java than in C - are you listening, H2?

Tuesday, 2 February 2010

SlashGeo'ed!

Woohoo! I got SlashGeo'ed!

Tagged with cool icons as Software, Open Source, and - er - Geocoding?

Saturday, 30 January 2010

Announcing Proj4J

It's been a while since I posted anything, because like a true programmer I've been too busy cutting code to write about it. Time to catch up!

My main focus for a while has been working on a Java port of the popular PROJ.4 projection library. This was motivated by finding the great work that JHLabs started with his JavaProj port. That seemed too good to be left to languish, so I dusted it off and started to clean it up. The library is now known as Proj4J. It has been substantially reworked to provide a more functional API and a cleaner codebase. Various bug fixes have been made to the core projections (although more remains to do!), and a formal test suite has been added.

To answer the question of "Why another Java projection library?", the main reason is that PROJ.4 is popular, well-tested and well-documented, so it seems like a good idea to make it available in the Java world. Other goals include creating a small, easy-to-understand, easy-to-use library, which can be embedded and/or extended as desired by small projects. A personal goal is to provide coordinate system transformation services in JEQL (which is now realized by exposing Proj4J via JEQL functions).

OSGeo is hosting the codebase as part of the MetaCRS umbrella project. There it lives in the good company of Proj4JS, CS-MAP, spatialreference.org and of course PROJ.4.

The code is shaping up nicely for a 1.0 release. It can also be downloaded and used as it stands. Check it out! Even better, contribute some tests and bug fixes!


Wagner VII Projection

Friday, 2 October 2009

Why you should not sit on Wolfram Alpha

Unqualified Reservations has a great post dissecting the Wolfram Alpha hubristic User Interface.

Warning: this guy seems to be able to write faster than most people can read. Browse his blog at your peril!

Monday, 21 September 2009

OS and GM SXS for LEJOG in the UK

Translation: Ordnance Survey and Google Maps side-by-side for planning the Land's End-John O'Groats walk in the United Kingdom.

My friend Steve is walking the LEJOG in 2010 (I guess that makes him a LEJOGger - as opposed to the JOGLErs who go North to South). Of course he is blogging about it - doesn't everybody?.

Even better, he's planning his route using Where's The Path, which is a fantastic side-by-side map site for the U.K. It shows Ordnance Survey topographic maps and Google Maps imagery in a cleverly linked view (similar to the VExGM web site I blogged about earlier this year, but even more useful). Here's a link to the most famous location in Britain (but I suspect it doesn't feature in many LEJOG itineraries, unless you're a real anorak for walking).


WTP also lets you digitize paths and export them to GPX and KML. Steve is finding the KML export particularly useful, since it allows him to locate all the pubs along the route of his walk.

Now, when is someone going to do this for Canada? All those prospective SJV'ers out there would be overjoyed!

Tuesday, 15 September 2009

Earn a diploma with JTS!

(No doubt by choosing that title I've exposed by my blog to death by spam filter. Although if the ones with titles like "Increase your performance with JTS!" got through, maybe I'm ok.)

The Kungliga Tekniska högskolan in Sweden (isn't Unicode wonderful?) is offering a course on GIS Architecture. I was happy to see that they're using JTS as a tool for exploring the DE-9IM spatial relationship model, and also as an example of a Open Source geospatial library.


Tuesday, 25 August 2009

JTS - Links to Geometry APIs and papers

Motivated by a thread on the GEOS list about the Boost.Polygon library, I've added some new resources to the JTS web site, including:
  • a list of geometry APIs
  • a section of references to papers about topics in Computational Geometry which are relevant to JTS.
The resources have been factored out onto this page.

Over the years of working with JTS I've read a ton of interesting and sometimes even useful academic papers about aspects of Computational Geometry. I hope to start documenting these on this page, perhaps with some (no doubt opinionated) comments about applicability. A treasure trove for Geometry geeks!

Wednesday, 19 August 2009

Super Zoom Control is really super!

The web mapping site 192.com has a very nice map zoom control which dynamically displays a representative image of the map at the prospective zoom level.


This nicely addresses a problem which has occurred in a few applications I've been involved with recently - how to communicate to the user what a given zoom level actually looks like.

At first I thought the control might be truly dynamic - i.e. pulling a live tile from the centre point of the current map view. But it's actually just a set of tile snippets at a fixed point. Still, it provides a very nice effect. (The dynamic version would be simple to implement, but perhaps would be too slow in actual use, with unpredicatable latency).

Are you listening, OpenLayers?

Thursday, 6 August 2009

Visualizations of Sorting Algorithms in your browser

Here's a nifty web page which shows a way of visualizing the operation of various common sorting algorithms.

Sample output for Heapsort:



The coolest thing about this of course is the ability to dynamically generate rich graphics in the browser. All hail the HTML5 <canvas> element!

Tuesday, 9 June 2009

Meme of the Month

Well, it's obvious, innit?


You can be the first to write the Wikipedia article!