Pages

Monday, September 26, 2011

Reading Shapefiles from the Cloud

In a previous post, I wrote about saving shapefiles using pyshp to file-like objects and demonstrated how to save a shapefile to a zip file. PyShp has the ability to read from Python file-like objects including zip files as well (as of version 1.1.2).  Both the Reader object and the Writer.save() method accept keyword arguments which can be file-like objects allowing you to read and write shapefiles without any disk activity.

In this post, we'll read a shapefile directly from a zip file on a server all in memory.

Normally to read a shapefile from the file system you just pass in the name of the file to the Reader object as a string:

import shapefile
r = shapefile.Reader("myshapefile")

But if you use the keywords shp, shx, and dbf, then you can specify file-like objects.  This example will demonstrate reading a shapefile - from a zip file - on a website.

import urllib2
import zipfile
import StringIO
import shapefile

cloudshape = urllib2.urlopen("http://pyshp.googlecode.com/files/GIS_CensusTract.zip")
memoryshape = StringIO.StringIO(cloudshape.read())
zipshape = zipfile.ZipFile(memoryshape)
shpname, shxname, dbfname, prjname = zipshape.namelist()
cloudshp = StringIO.StringIO(zipshape.read(shpname))
cloudshx = StringIO.StringIO(zipshape.read(shxname))
clouddbf = StringIO.StringIO(zipshape.read(dbfname))
r = shapefile.Reader(shp=cloudshp, shx=cloudshx, dbf=clouddbf)
r.bbox
[-89.8744162216216, 30.161122135135138, -89.1383837783784, 30.661213864864862]

You may specify only one of the three file types if you are just trying to read one of the file types. Some attributes such as Reader.shapeName will not be available using this method.

File-like objects provide a lot of power. However it is important to note that not all file-like objects implement all of the file methods. In the above example the urllib2 module does not provide the "seek" method needed by the zipfile module. The ZipFile read() method is the same way.  To get around that issue, we transfer the data to the StringIO or cStringIO module in memory to ensure compatibility. If the data is potentially too big to hold in memory you can use the tempfile module to temporarily store the shapefile data on disk.

Monday, September 5, 2011

Map Projections

A reader pointed out to me recently that that the pyshp documetnatin or wiki should include something about map projections.  And he is right.   Many programmers working with shapefiles are not necessarily geospatial professionals but have found themselves working with geodata on some project.

It is very difficult to just "scratch the surface" of GIS.  You don't have to dig very deep into this field before you uncover some of the eccentricities of geographic data. Map projections are one such feature that is easy to understand at a basic level but has huge implications for geospatial programmers.

Map projections are conceptually straight-forward and intuitive.  If you try to take any three-dimensional object and flatten it onto a plane, such as your screen or a sheet of paper, the object is distorted.  (Remember the orange peel experiment from 7th grade geography?) You can manipulate this distortion to preserve common properties such as area, scale, bearing, distance, shape, etc. 

I won't go into the details of map projections as there are thousands of web pages and online videos devoted to the subject.  But there are some things you need to know for dealing with them programmatically.  First of all, most geospatial data formats don't even contain any information about map projections.  This lack of metadata is really mostly just geospatial cultural history with some technical reasons.  And furthermore, while the concept of map projections is easy to grasp, the math to transform a coordinate from one projection to another is quite complex.  The end result is most data libraries don't deal with projections in any way.

But now, thanks to modern software and the Internet making data exchange easier and more common, nearly every data format, both images and vector, have tacked on a metadata format that defines the projection.  For shapefiles this is the .prj projection file which follows the naming convention .prj   In this file, if it exists, you will find a string defining the projection in a format called well-known text or WKT.  And here's a gotch that blew my mind as a programmer a long time ago: if you don't have that projection definition, and you don't know who created the data - there is no way you are ever going to figure it out.  The coordinates in the file are just numbers and offer no clue to the projection.  You don't run into this problem much any more but it used to be quite common because GIS shops typically produced maps and not data.  All your coworkers knew the preferred projection for your shop so nobody bothered to create a bunch of metadata.  But now, modern GIS software won't even let you load a shapefile into a map without forcing you to choose a projection if it's not already defined.  And that's a good thing.

If you do need to deal with projections programmatically you basically have one choice: the PROJ4 library.  It is one of the few free libraries, if not the only library period, that comprehensively deals with re-projecting goespatial data.  Fortunately it has bindings for just about every language out there and is incorporated into many libraries including OGR.  There is a Python project called pyproj which provides python bindings.

So be aware that projections are not trivial and can often add a lot of complexity to what would otherwise be a simple programming project.  And also know that pyshp does nothing to work with map projections.  I did an earlier post on how to create a .prj file for a shapefile and why I chose not to include this functionality in the library itself.

Here are some other resources related to map projections.

SpatialReference.org - a clearning house for projection definitions

PROJ4 - the #1 map projection library

OGR2OSM - Python script to convert OGR vector formats to the Open Street Map format with projection support

PyProj - Python bindings for Proj4 library

GDAL - Python bindings to GDAL which contains OGR and PROJ4 allowing you to reporject raster and vector data

Friday, September 2, 2011

Pyshp now available via setuptools

The Python Shapefile Library (a.k.a pyshp) is now properly registered on the Python Package Index so it can be installed via setuptools and, possibly more importantly, registered as a dependency by other packages.  At this time the "usage.py" doctests also download with it but not the sample blockgroups shapefile.  I will eventually rewrite the doctests to create a sample shapefile used to verify the reader class.  It is available as both an egg and source distribution.

I'd like to thank users "memedough" and Sebastian for pushing me to go ahead and set this up because yes I'm that lazy where after 2 years I still hadn't bothered to spend the 15 minutes to do it.

If you have setuptools installed, just run: easy_install pyshp

The setup.py script is also now in the subversion respoistory on google code just so I don't lose track of it.  The PyPi page can be found here:

http://pypi.python.org/pypi/pyshp/