Tuesday, May 29, 2012

SBN Mystery - Solved!

Last October I asked you all for help in figuring out the spatial indexing algorithm used to create Esri sbn files.  Using Pyshp, I had successfully decoded the file formats which I provided. However I could not figure out the algorithm used to create and populate the spatial bins within these files.

Today I'm pleased to announce this challenge has been answered.  The GIS community now has access to both the sbn and sbx file format as well as the algorithm for grouping features in a shapefile into "spatial bins".

I'm glad I asked for help as this challenge turned out to be quite difficult. The brain behind this operation is Marc Pfister with some good insights from Si Parker.  Marc worked tirelessly on this problem for months with a cross-country move and complete career change thrown in to make it interesting.  Marc did all the heavy intellectual lifting with me playing an inquisitive, but usually short-sighted Watson to his Holmes.  I generated endless series of shapefiles and one-off scripts to help Marc try and flush out the algorithm based only on subtle changes in the number of bins and features they contained as well as his past experience with spatial indexing.

When I figured out the file formats I had hoped I was just a Wikipedia search away from recognizing the spatial tree algorithm.  But the solution turned out to be much more complex than that.  Esri uses a sort of balanced tree that exhibits traits of several different algorithms.  The system seems carefully designed but is by no means obvious.  I will publish Marc's findings as soon as I can.

There are still a few shapefile cases which create puzzling but insignificant results. However we are at the 98% mark. The project goal of compatibility has been reached.  There is no longer any reason to hold off on sharing the results.  We are fairly certain that we are able to create sbn and sbx files which sufficiently fool ArcMap as well as other Esri packages so other software can read, use, and generate these indexes alongside the Esri suite.  There is more testing to do but it seems we are out of the woods.

What we haven't done is nicely packaged all of this work up.  But Marc posted a small set of Python scripts on github which demonstrate the algorithm and file handling needed to copy this capability.  Over the coming months I will fold this code into Pyshp, produce better documentation on the algorithm, and provide posts on how to deal with these indexes. But for now here's what you've been waiting for:

By the way, Marc does freelance programming.  In my job, I get the opportunity to work with lots of really bright geospatial programmers and mathematicians and this guy is one of the very best I've ever seen.  If you have a tough geospatial project and need some E=MC2 smarts definitely look him up.

No comments:

Post a Comment