Tuesday, November 5, 2013

Geospatial Python - the Book

I'm pleased to announce Packt Publishing officially released my new book "Learning Geospatial Analysis with Python".  Packt already has two other books related to geospatial Python.  All three books are featured on the right-side border of this page.  I went through a great deal of effort to compliment these two existing books with different but relevant information.

My goal for this book is to provide geospatial Python examples with the following priorities:
  1. Use Pure Python examples as much as possible (standard library only, or third party libraries that only use the standard library)
  2. Use PYPI-compatible libraries which can be installed using easy_install or pip
  3. If I must use Python bindings to a third-party compiled library, focus on methods which lack good examples elsewhere and avoid repeating other books and blogs
If you've ever tried to implement even a simple spatial algorithm, you'll find the vast majority of programmers rely on a relatively small set of core libraries.  These libraries are then built up into larger toolkits which tie those libraries together and add additional value.

The following diagram is a high-level summary of the geospatial software ecosystem, and thereby the Python geospatial ecosystem.  The "800-pound gorillas" including GEOS, PROJ.4 and GDAL are the foundation of nearly all other software.  In this diagram the CGAL and JTS libraries are different.  Both CGAL and JTS are the reference implementations for many of the spatial algorithms later used by GEOS:

The geospatial software industry consists of a small handful of open-source
libraries which support the vast majority of all other software, both proprietary
and open-source.

These libraries are:
  1. Actively maintained and free (open source!)
  2. Implement operations and data access complex enough that nobody wants to reinvent the wheel
I have used these libraries for years.  But I am also stubbornly curious.  I like to understand the algorithms inside the proverbial black box.  My favorite way to understand algorithms is through Python.  I do not have a computer science degree or a degree in mathematics.  I am ignorantly dangerous in almost any programming language.

But what I like about Python is the syntax. It is so clean and expressive it makes hard things easier.  I am not alone.  Scores of non-programmer biologists, meteorologists, economists, medical researchers, artists and other specialists have found Python a user-friendly way to model and solve both trivial and complicated real-world problems.  

But there is a large gap in the literature. For any given problem you can almost always find pure algorithms in research papers or implementations in C/C++ libraries.  But rarely can you find working, practical implementations in Python.  Of course many of the compiled libraries are open source but their structure is complex enough that the casual learner must invest a significant amount of time in understanding most of the code base to uncover a given algorithm.  

The most common reason cited for using compiled libraries versus pure Python is that interpreted languages like Python are far too slow to be efficient and useful for most geospatial operations.  I find this reason unsatisfying for the following reasons:
  1. It discredits Python as a learning tool.  There's no need for speed in learning.
  2. As server platforms become increasingly virtualized, multiplied, and abstracted, our view of resource constraints can continually be relaxed in exchange for faster development with more participants 
  3. There are many tools out there which make pure Python faster when you are ready to optimize
  4. As geospatial applications become more task specific, implementing a handful of algorithms instead of using a "kitchen-sink" toolkit becomes desirable
  5. As mobile devices become increasingly important, dragging around a large compiled tool chain becomes less desirable.
  6. When you bind Python to a compiled library, both of which are updated regularly you have an additional maintenance routine supporting multiple platforms.  This issue varies in difficulty but it's never easy.  This group excludes compiled libraries designed specifically for Python (Numpy, PIL, Fiona, Shapely).
  7. To me, Python's greatest value is as a meta-syntax.  The Python C interpreter alone runs on 30+ platforms.  But the Python C interpreter is often called a "reference implementation".  The Python interpreter has been implemented in dozens of other programming languages including Java (Jython) and .NET (IronPython), and Python itself (PyPy).  So why do people go through the trouble to port Python so many places?  Because the programmatic expression allowed by the syntax is just great.  I always liked the metaphor, "Python is executable pseudocode".  The excellent design of the language from a user standpoint is why nearly every geospatial platform, proprietary or open-source, has a Python API as opposed to other options.
I originally wrote the PyShp library to learn more about both Python and Shapefiles.  I kept it simple by releasing a single file using only the standard library.  Since its release PyShp has become extremely popular. It proved to me there is indeed a demand for spatial software written in pure Python without external dependencies.

"Learning Geospatial Analysis with Python" is my humble attempt at laying a plank in the bridge between abstract algorithms and as close to pure Python implementations as I could.  I took the word "learning" seriously.  I don't try to teach Python as a language because that's been done and done well already.  I do explore common geospatial concepts early on and move into algorithms later in the book which are more advanced (and useful!).

So if you're new to geospatial analysis the first few chapters try to ease you into the field from a software and data perspective before switching to Python.  If you're an experienced geospatial developer I included some interesting examples of virtually undocumented features of GDAL in Python as well as examples of using Numpy for operations that are normally handed off to GDAL or or SciPy.  This book contains more examples of Remote Sensing using Python than any book I've come across yet.

The book is available from Amazon, O'Reilly, and Barnes & Noble as both a paperback and all the major eBook formats.  I hope you find it useful and I hope it increases the use of Python for geospatial analysis and applications.

No comments:

Post a Comment